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PREFACE 


The  purpose  of  this  study  is  to  provide  an  introduction  to  a  class  of  mathematical 
techniques  required  to  treat  the  problems  that  arise  in  the  planning  of  multistage  processes, 
many  of  which  are  of  day-to-day  importance  to  military  and  other  government  workers. 
These  are  programming  problems,  to  use  the  terminology  currently  popular,  and  I  have 
introduced  the  adjective  "dynamic”  to  indicate  that  they  are  problems  in  which  time 
plays  an  important  role  and  in  which  the  order  of  performance  of  operations  is  all- 
important.  This  differentiation  is  not  merely  one  of  nomenclature,  but  is  definitely 
conceptual,  and  we  shall  see  that,  properly  interpreted,  it  furnishes  us  with  a  powerful 
mathematical  tool  with  which  to  treat  these  problems. 

The  multistage  processes  in  which  we  are  interested  are  composed  of  sequences  of 
operations  in  which  the  outcome  of  the  preceding  operations  may  be  used  to  guide  the 
course  of  the  future  ones.  There  are  two  types  of  operations  that  we  can  distinguish 
immediately:  those  in  which  the  outcome  is  completely  determined,  and  those  in  which 
the  outcome  is  predictable  on  the  basis  of  a  probability  distribution.  Depending  on  the 
point  of  view,  either  type  may  be  considered  to  be  an  approximation  to  the  reality  repre¬ 
sented  by  the  other.  Although  we  shall  see  that  mathematically  the  two  viewpoints  are 
not  far  apart,  in  any  practical  situation  the  two  philosophies  may  clash  violently. 

Any  realistic  treatment  of  investment  and  replacement  theory,  of  scientific  sampling 
and  testing,  of  learning  theory,  of  industrial  production  problems — to  mention  only  a  few 
areas  of  importance — must  involve  to  a  greater  or  lesser  extent  problems  of  dynamic 
programming.  From  this  it  follows  that  however  important  planning  has  been  in  the 
past  in  the  face  of  the  riddles  of  an  uncertain  future,  it  must  inevitably  assume  a  role  of 
greater  and  greater  importance  as  an  increasing  population  with  increasing  technological 
demands  faces  the  challenge  of  a  world  with  shrinking  resources. 

The  theory  has  particular  relevance  to  government  planning,  ranging  in  scope  from 
the  study  of  actual  operations  to  questions  of  the  procurement  and  replacement  of  equip¬ 
ment  and  to  problems  of  the  training  and  allocation  of  personnel. 

Since  most  of  the  problems  that  arise  are  of  an  entirely  novel  type  frequently  offering 
formidable  mathematical  difficulties,  we  shall  restrict  ourselves  to  a  consideration  of 
the  simplest  problems  possessing  a  germ  of  reality  in  order  not  to  obscure  by  extraneous 
analytic  and  algebraic  complications  the  techniques  we  employ. 

The  realistic  problems  that  confront  the  theory  of  dynamic  programming  are  in  order 
of  complexity  on  a  par  with  the  three-body  problem  of  classical  dynamics,  whereas  the 
theory  painfully  scrambles  to  solve  problems  on  a  level  with  that  of  the  motion  of  a 
freely  falling  particle.  Nonetheless,  there  is  no  cause  for  discouragement.  Consider  the 
case  of  the  nuclear  physicist.  In  attempting  to  explain  the  behavior  of  heavy  atoms,  he  is 
forced  to  treat  of  an  «-body  problem  infinitely  more  complex  than  the  above  only- 
partially-solved  astronomical  problem.  Nevertheless,  by  combining  the  exact  results  of 
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the  one-body  problem  and  the  two-body  problem  with  the  results  of  experiment  and 
observation,  he  is  able  to  construct  an  imposing  theoretical  structure,  albeit  one  with  an 
occasional  blind  alley  or  barred  window,  that  is  amazingly  useful  in  predicting  and 
explaining  experimental  results. 

Similarly,  in  discussing  the  exceedingly  involved  planning  problems  of  economic  life, 
further  complicated  by  sociological  and  psychological  problems,  we  must  combine  in  a 
skillful  fashion  the  exact  results  of  the  simple  models  with  the  intuitive  theory  derived 
from  experience.  The  fashion  in  which  this  is  to  be  done  is  beyond  the  power  of  the 
mathematician  to  describe.  It  must  be  realized  that  however  elegant  the  mathematical 
theory,  however  consistent  and  economical  its  axioms,  eventually  the  point  of  meta¬ 
mathematics  will  be  reached  at  which  someone  will  have  to  say,  "I  prefer  this  theory.” 

In  order  not  to  increase  unduly  the  size  of  this  study,  I  have  been  forced  to  omit  any 
mention  of  a  number  of  important  and  interesting  investigations  and  to  include  only  a 
part  of  the  results  known  concerning  the  topics  included. 

To  begin  with,  I  have  not  included  any  treatment  of  the  mathematical  theory  of 
learning  as  formulated  by  R.  Bush  and  F.  Mosteller,  jointly,  and  by  M.  Flood.  Extensive 
results  in  this  field  have  been  obtained  by  T.  Harris,  H.  N.  Shapiro,  and  the  author,  and, 
independently,  together  with  generalizations,  by  S.  Karlin. 

Nor  have  I  included  results  recently  obtained  by  S.  Johnson  and  S.  Karlin  concerning 
processes  in  which  the  distribution  of  outcomes  is  only  partially  known.  These  are  prob¬ 
lems  of  great  importance  in  statistical  applications  and  arise  in  other  connections  as  well. 
A  description  of  problems  of  this  type  will  be  found  in  an  expository  paper  by  H.  Robbins. 

Because  of  the  difficulty  of  adequately  summarizing  his  results  in  any  brief  space,  no 
mention  has  been  made  of  the  extensive  theory  of  pursuit  games  created  by  R.  Isaacs. 
These  games  are  related  to  the  games  of  survival  briefly  discussed  in  Chapter  6.  Both 
types  of  games  belong  to  the  general  class  of  multistage  games,  which  has  not  been 
touched  upon  here,  although  there  are  many  interesting  results  known  concerning  these 
games,  as,  for  example,  the  results  of  D.  Blackwell  and  the  author  concerning  games  of 
bluffing  and  elimination  of  randomization  and  the  related  results  of  A.  Dvoretzky, 
H.  Wald,  and  J.  Wolfowitz. 

Finally,  I  have  not  included  the  recent  investigations  of  I.  Glicksberg,  O.  Gross,  and 
myself  concerning  the  important  and  novel  variational  problems  that  arise  in  connection 
with  problems  of  economic  and  mechanical  control. 

In  connection  with  the  computational  aspects  of  the  theory  of  dynamic  programming, 
I  have  not  discussed  any  applications  of  the  ''simplex”  method  of  G.  Dantzig  that  has 
proved  of  such  great  value  in  the  theory  of  linear  programming  and  yields  the  solution 
of  many  important  classes  of  dynamic  programming  problems. 

It  is  a  pleasure  to  acknowledge  my  indebtedness  to  a  number  of  sources:  First,  to  the 
von  Neumann  theory  of  games,  as  developed  by  J.  von  Neumann,  O.  Morgenstern,  and 
others,  which  shows  how  to  treat  by  mathematical  analysis  vast  classes  of  problems  for¬ 
merly  thought  far  out  of  the  reach  of  the  mathematician — and  relegated,  therefore,  to  the 
limbo  of  imponderables — and,  simultaneously,  to  the  Wald  theory  of  sequential  analysis, 
as  developed  by  A.  Wald,  D.  Blackwell,  A.  Girshick,  J.  Wolfowitz,  and  others,  which 
shows  the  vast  economy  of  effort  that  may  be  effected  by  the  proper  consideration  of 
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multistage  testing  processes;  second,  to  a  number  of  colleagues  and  friends  who  have 
discussed  various  aspects  of  the  theory  with  me  and  have  contributed  greatly  to  its 
clarification  and  growth. 

In  particular,  the  last  section  of  Chapter  6  is  taken  verbatim  from  an  unpublished 
paper  by  M.  Peisakoff.  Section  2.12  is  from  an  unpublished  paper  by  S.  Karlin  and 
H.  N.  Shapiro,  as  is  also  Section  3.12,  while  Section  3.11  is  based  on  a  personal  com¬ 
munication  from  H.  N.  Shapiro.  A  partial  solution  of  the  problem  in  Section  3.11  had 
previously  been  given  by  O.  Gross,  using  a  different  approach.  The  solution  of  Eq.  (3.1) 
was  obtained  while  collaborating  with  M.  Shiffman;  the  solution  of  Eq.  (5.45)  was 
obtained  in  collaboration  with  D.  Blackwell;  and  the  formulation  in  mathematical  terms 
of  games  of  survival  was  obtained  in  collaboration  with  J.  LaSalle. 

The  optimal  inventory  problem  mentioned  in  Chapter  1  and  discussed  briefly  in 
Chapter  4  was  first  studied  by  K.  Arrow,  T.  Harris,  and  J.  Marschak.  Following  this,  an 
extensive  treatment,  together  with  many  generalizations,  was  given  by  A.  Dvoretzky, 
J.  Kiefer,  and  J.  Wolfowitz. 

I  should  like  to  thank  Oliver  Gross,  who  read  the  final  manuscript  through  with  great 
care  and  made  a  large  number  of  valuable  suggestions  and  corrections. 

Finally,  I  should  like  to  record  a  particular  debt  of  gratitude  to  O.  Helmet  and  E.  W. 
Paxson,  who  early  appreciated  the  importance  of  the  study  of  multistage  processes  and, 
in  addition  to  furnishing  a  large  number  of  stimulating  problems  arising  naturally  in 
various  important  applications,  constantly  encouraged  me  in  my  researches. 


SUMMARY 


Dynamic  programming  is  a  mathematical  theory  devoted  to  the  study  of  multistage 
processes.  The  multistage  processes  discussed  in  this  report  are  composed  of  sequences  of 
operations  in  which  the  outcome  of  those  preceding  may  be  used  to  guide  the  course  of 
future  ones.  Operations  of  both  deterministic  and  stochastic  types  are  discussed. 

After  an  introductory  chapter,  in  which  a  number  of  representative  problems  are  in¬ 
vestigated,  and  a  succeeding  chapter,  in  which  some  general  mathematical  results  are 
obtained,  the  remainder  of  the  report  is  devoted  to  the  study  of  equations  of  particular 
types  that  arise  in  various  applications  of  the  theory. 
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CHAPTER  1 

FUNDAMENTAL  CONCEPTS 


1.1.  Introduction 

We  propose  in  this  chapter  to  discuss  a  number  of  representative  problems  in  the 
theory  of  dynamic  programming,  emphasizing  their  conceptual  and  analytic  aspects.  In 
place  of  any  discussion  of  an  abstract  type,  which  at  this  stage  would  necessarily  remain 
rather  vague,  we  shall  begin  the  chapter  by  posing  a  number  of  simple  prototypes  of 
general  problems  that  fall  within  the  domain  of  our  theory.  Following  this  we  shall 
present  various  mathematical  approaches  to  these  problems  and  introduce  the  reader  to 
the  functional  equation  technique  which  we  shall  employ  throughout  most  of  the  study. 
Since  both  the  class  of  problems  we  shall  encounter  and  the  techniques  we  shall  employ 
possess  certain  features  of  novelty,  we  shall  not  hesitate  to  be  repetitious  to  a  certain 
extent,  feeling  that,  in  an  introductory  work,  sins  of  repetition  are  of  lesser  magnitude 
than  sins  of  omission. 

1.2.  Some  Problems 

Problem  1.1.  We  are  given  a  quantity  x  >  0  that  may  be  divided  into  two  parts, 
y  and  x  —  y.-  From  y  we  obtain  a  return  of  g(y);  and  from  (x  —  y),  a  return  b(x  —  y). 
In  so  doing  we  expend  a  certain  amount  of  our  original  resources  and  are  left  with  a 
new  quantity,  ay  +  b(x  —  y),  0  <  <*,  b  <  1,  with  which  to  continue  the  process.  How 
does  one  proceed  so  as  to  maximize  the  total  return  obtained  in  a  finite,  or  unbounded, 
number  of  stages? 

Problem  1.2.  We  are  given  a  quantity  x  >  0  that  is  to  be  utilized  to  accomplish  a 
certain  task.  If  an  amount  y,  where  0  <  y  <  x,  is  used  on  any  single  attempt,  the  proba¬ 
bility  of  success  is  a(y).  If  the  task  is  not  accomplished  on  the  first  try,  we  continue  with 
the  new  quantity  x  —  y.  How  does  one  proceed  in  order  to  maximize  the  over-all 
probability  of  success? 

Problem  1.3.  We  are  informed  that  a  particle  is  in  either  state  0  or  1,  and  we  are 
given  initially  the  probability  x  that  it  is  in  state  1.  Use  of  the  operation  A  will  reduce 
this  probability  to  ax,  where  a  is  some  positive  constant  less  than  1,  whereas  operation  L, 
which  consists  in  observing  the  particle,  will  tell  us  definitely  which  state  it  is  in.  If  it  is 
desired  to  transform  the  particle  into  state  0  in  a  minimum  time,  what  is  the  optimal 
procedure? 

Problem  1.4.  At  each  stage  of  sequence  of  actions  we  are  allowed  our  choice  of  one 
of  two  actions.  The  first  has  associated  a  probability  p1  of  gaining  one  unit,  a  probability 
p2  of  gaining  two  units,  and  a  probability  p3  of  terminating  the  process.  The  second  has  a 
similar  set  of  probabilities  pi,  p'2,  p3.  What  sequence  of  choices  maximizes  the  probability 
of  attaining  at  least  n  units  before  the  process  is  terminated? 
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Problem  1.5.  We  are  fortunate  enough  to  possess  two  gold  mines,  A  and  B,  the 
first  of  which  possesses  an  amount  x  of  gold,  while  the  second  possesses  an  amount  y.  If 
the  only  gold-mining  machine  we  have  is  used  in  A,  there  is  a  probability  />,  that  r  per 
cent  of  the  gold  there  will  be  brought  up  safely,  the  machine  still  being  usable,  and  a 
probability  (1  —  />,)  that  the  machine  will  be  damaged,  will  mine  no  gold,  and  will  be 
of  no  further  use.  Similarly,  mine  B  has  the  probabilities  <y,  and  (1  —  qx)  associated  with 
it.  How  does  one  proceed  in  order  to  maximize  the  total  amount  of  gold  obtained  before 
the  machine  is  defunct? 

Problem  1.6.  Let  us  consider  the  above  problem  in  the  case  in  which  we  know  only 
the  expected  amounts  of  gold  in  each  mine  and  the  expected  amount  mined  each  time, 
without  being  able  to  observe  the  results  of  individual  operations. 

Problem  1.7.  Two  players,  A  and  B,  the  first  possessing  x  dollars  and  the  second 
possessing  y  dollars,  play  a  modified  coin-tossing  game  described  by  the  matrix 


Assuming  that  each  player  is  motivated  by  a  desire  to  min  the  other,  how  does  each  play? 

1 .3.  Enumerative  Solutions — Deterministic  Case 

Having  posed  the  problems,  let  us  now  consider  what  we  shall  accept  as  an  answer. 
Clearly,  what  we  desire  is  a  rule  that,  when  given  the  initial  parameters,  the  allowable 
operations,  and  their  outcomes,  yields  a  sequence  of  actions  which  achieves,  or  attempts  to 
achieve,  a  designated  goal  in  a  fashion  that  is  optimal  in  some  previously  specified  sense. 

One  way  of  obtaining  this  rule  is  to  list  all  possible  rules,  calculate  the  effect  of  each, 
and  then  choose  the  one  best  suited  for  our  purposes.  This  method  we  call  the  "enumera- 
tive”  method. 

Let  us  pursue  this  technique  in  the  case  of  Problem  1.1,  which  is  deterministic,  begin¬ 
ning  with  the  case  in  which  exactly  N  operations  are  permitted.  Let  yu  y2,  ■  •  ■ ,  yK  be  the 
sequence  of  choices.  The  total  return  will  be 

K  K 

Ryu y-2,  •  •  • , 7w)  =  ^£(71)  +  ~  y^’  (11) 

« 

where  the  variables  are  constrained  by  the  conditions 

(a)  0  <  y,  <x,, 

(b)  x,  =  x, 

x2  =  ay i  4-  £(x,  —  y,). 


x„  =  ay —  ?»_,)  .  (1.2) 

The  problem  is  now  to  maximize  /  subject  to  the  above  constraints.  Since  several  of 
the  yt  may  be  0  or  (end  points  of  the  allowable  intervals),  any  naive  application  of 
calculus  is  somewhat  hazardous. 
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For  N  infinite,  which  is  to  say  that  an  unbounded  number  of  operations  are  permitted, 
the  problem  is  one  involving  an  infinite  number  of  variables,  and  rather  more  discussion 
is  required.  Let  us  observe  that  the  case  of  infinite  N,  which  is  meaningless  in  any  prac¬ 
tical  situation,  possesses  a  very  important  invariance  property  from  the  mathematical  point 
of  view,  since  after  any  finite  number  of  stages  there  still  remains  a  process  with  an 
infinite  number  of  stages.  This  fact  will  be  of  great  utility  in  our  subsequent  discussion. 

1.4.  Enumerative  Solutions — Stochastic  Case 

In  Problems  1.1  through  1.7  the  outcome  of  any  action  is  indeterminate,  specified  only 
by  a  distribution  function,  which  we  take  to  be  known.  Problems  of  a  second  order  of 
difficulty,  overlapping  the  domain  of  sequential  analysis,  are  those  in  which  the  distribu¬ 
tion  function  is  only  partially  known.  Third-order  problems  would  perhaps  be  those  in 
which  it  is  not  known  whether  or  not  a  distribution  function  exists.  We  see  from  this 
brief  listing  that  it  is  possible  to  construct  a  hierarchy  of  problems  ranging  from  the 
blissful  state  of  complete  determinacy  to  the  inferno  of  utter  ignorance.  In  this  intro¬ 
ductory  treatment  we  shall  consider  only  first-order  problems. 

In  order  to  understand  what  an  enumerative  solution  of  a  stochastic  decision  problem 
involves,  let  us  discusss  Problem  1.4,  considering  the  simple  case  in  which  only  two 
stages  are  allowed. 

In  the  general  case  in  which  N  stages  are  allowed,  we  require  2  -  4'-'  listings  in  order 
to  enumerate  all  possible  rules.  If  N  is  infinite,  which  is  to  say  that  the  process  is 
allowed  to  continue  until  it  terminates  of  itself,  the  number  of  possible  rules  is  non- 
enumerable.*  This  fact  will  make  any  direct  application  of  the  enumerative  method  some¬ 
what  tedious  of  execution. 

The  possible  sequences  of  choices  may  be  illustrated  graphically  by  means  of  a  tree, 
as  shown  in  Fig.  1.1. 


Fig.  1.1 

The  eight  possible  rules  are 

A(l)A,  A(2)A,  A(1)B,  A(2)Bf 

B(1)A,  B(2)A,  B(l)B,  B(2)B,  (1.3) 

*  We  recall  that  an  infinite  process  which  allows  one  of  two  choices  at  each  stage  yields  a  set  of 
possible  sequences  that  may  be  put  into  1  —  1  correspondence  with  the  dyadic  expressions  of  the  real 
numbers  in  [0,  1]. 
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where,  for  example,  A(/)B  means  that  A  is  chosen  first,  i  units  are  obtained,  and  B  is 
then  chosen.  If  the  process  terminates  with  an  initial  choice  of  A  or  B,  there  is,  of  course, 
no  further  need  for  decision. 

We  now  require  a  method  for  comparing  the  outcomes  of  different  rules.  Since  we  are 
dealing  with  stochastic  sequences,  let  us  use  the  metric  of  probability  theory  and  consider 
the  expected  return.  In  general,  let  us  note,  it  is  not  the  expected  return  that  is  important, 
but  rather  the  expected  value  of  some  function  of  the  total  return.  In  the  case  of  Problem 
1.4,  this  function  has  the  form,  if  R  is  the  return. 


R>n, 

R  <  «, 


(1.4) 


since  the  expected  value  of  f(R)  is  precisely  the  probability  that  R  >  n. 

It  is  now  not  difficult  to  calculate  the  desired  expected  value  and  to  compare  the  eight 
numbers  obtained  in  this  way  to  obtain  the  optimal  policy.  Although  feasible  for  small  N, 
this  technique  is  impossible  of  execution  for  N  of  even  moderate  size. 

We  shall  see,  subsequently,  that  the  enumerative  method  possesses  theoretical  value  in 
some  cases  and  computational  value  in  others.  In  general,  however,  it  is  inferior  to  the 
method  we  shall  employ  throughout  most  of  the  study. 


1.5.  Enumerative  Approach — II 

The  problems  above  lead  to  a  complicated  enumeration  of  cases  because  of  the  fact 
that  a  policy  consists  not  merely  in  a  selection  of  choices  of  A  or  B,  but  actually  in  a 
selection  coupled  with  actual  occurrences.  Hence,  in  place  of  the  four  policies  AA,  AB, 
BA,  BB  for  the  two-stage  process,  we  have  the  eight  policies  of  the  form  A(/)B,  B(j)A 
to  consider. 

In  Problem  1.7,  in  which  the  results  of  an  individual  choice  cannot  be  ascertained,  we 
need  consider,  in  the  two-stage  process,  only  the  four  choices  AA,  AB,  BA,  BB.  Let  us 
observe  that  a  policy  such  as  AB  is  to  be  interpreted  to  mean  that  B  is  used  on  the  second 
trial,  if  the  machine  survives  the  first  trial. 

It  is  interesting  to  note  that  analytically  there  will  be  no  differerrce  between  (1)  the 
above  problem,  in  which  we  do  not  know  the  precise  outcome  of  any  individual  action, 
(2)  a  similar  problem  in  which  we  do  observe  the  effect  of  each  choice,  provided  that 
the  effects  are  of  sufficiently  simple  type,  and  (3)  a  similar  completely  deterministic 
problem. 

The  enumerative  approach  here  leads  to  a  very  interesting  geometric  treatment  of  the 
problem,  which  we  shall  present  later  in  Chapter  3. 

1  A.  The  Functional  Equation  Approach 

Let  us  begin  by  observing  that  the  problems  posed  above  have  the  following  features 
in  common: 


1.  The  state  of  the  system  is  described  by  a  small  set  of  parameters. 

2.  The  effect  of  a  decision  is  to  transform  this  set  of  parameters  into  a  similar  set 


■& 


R(y)  =  g(y )  +  h(x  -  7)  +  f\_ay  +  b(x  -  7)] 


(1.6) 


The  maximum  return  will  be  obtained  if  y  is  chosen  to  maximize  Riy")-  Since  this 
maximum  return  is,  by  definition,  /(x),  we  obtain  the  functional  equation 


/(*)  =  Max  { g(y )  +  h(x  —  7)  +  f[ay  +  b{x  —  7)]}. 

o<y<x 


(1.7) 


Since  we  have  no  a  priori  assurance  that  /(x)  is  continuous,  even  if  g  and  b  are,  it  is 
better  to  write 


/(x)  =  Sup  {g(y)  -f  b(x  —  7)  +  flay  +  b(x  -  7)]} 

0<lf<X 


(1.8) 


and  then  to  prove,  under  certain  assumptions,  that  the  supremum  is  actually  attained. 

For  the  N- stage  process,  we  have,  using  an  obvious  notation  and  taking  g  and  h  to  be 
continuous, 

/ 1  (x)  =  Max  [g(y)  +  h(x  —  7)], 

o<u<x 

fx (x)  =  Max  {£(7)  +  h(x  —  7) 

0<V<X 


+  / v-i lay  +  &(x  —  7)]},  N  =  2,  3, 


(19) 


n 
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3.  The  past  history  of  the  system  is  of  no  importance  in  determining  future  actions, 
a  Markovian  property. 

We  have  purposely  left  this  description  rather  vague,  since  we  feel  it  is  the  spirit  of 
the  problem  rather  than  the  letter  that  is  significant.  It  is  extremely  important  to  realize 
that  one  cannot  axiomatize  mathematical  formulation  and  legislate  away  ingenuity.  In  some 
problems  the  state  variables  are  forced  on  one;  in  others  there  is  a  choice,  and  the 
mathematical  solution  will  stand  or  fall  depending  on  the  choice  that  is  made.  Experience 
alone  helps  in  the  setting  up  of  useful  mathematical  models. 

In  addition  to  the  above  facts,  we  require  the  following  simple  PRINCIPLE  OF 
OPTIMALITY:  An  optimal  policy  has  the  property  that  whatever  the  initial  state  and 
initial  decision  are ,  the  remaining  decisions  must  constitute  an  optimal  policy  with  regard 
to  the  state  resulting  from  the  first  decision. 

We  shall  now  apply  this  principle  to  obtain  functional  equations  whose  solutions  will 
yield  the  optimal  strategies. 

Problem  1.8.  Let  us  set 

/(x)  =  total  return  obtained  using  an  optimal  policy  of  allocation  of 
resources  at  each  stage,  where  an  unlimited  number  of  operations  is 
permitted.  (1.5) 

If  the  initial  allocation  is  7  and  x  —  7,  the  return  from  this  division  will  be 
g(y)  +  h{x  —  7),  with  ay  -f  b(x  —  7)  remaining  to  continue  the  process.  From  the 
definition  of  /(x),  paying  heed  to  our  fundamental  principle,  above,  it  follows  that  the 
total  return  from  ay  +  b(x  —  7)  will  be  f[ay  +  b{x  —  7)].  Consequently,  the  total 
return  derived  from  an  initial  allocation  of  7  and  x  —  7  will  be 
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We  now  see  the  advantage  of  the  mathematical  fiction  of  taking  N  infinite.  In  place 
of  the  sequence  of  functions,  {/v(x)},  given  by  (1.9),  we  have  one  function  /(x)  satis¬ 
fying  (1.8).  There  is,  naturally,  a  close  connection  between  the  sequence  {/*(*■)}  and 
/(x),  which  we  shall  subsequently  exploit. 

Having  a  functional  equation  for  /(x)  that  under  certain  simple  natural  conditions 
determines  /(x)  uniquely,  as  we  shall  see,  the  question  naturally  arises  as  to  how  this 
function  is  to  be  used  to  determine  an  optimal  policy.  Turning  to  (1.7)  we  see  that  y  is 
the  quantity,  or  a  quantity,  that  maximizes  g(y)  4-  >&(x  —  y)  +  f[ay  +  b(x  —  y)]  in 
[0,  x].  This  quantity  y  is  a  known  function  of  x  if  /(x)  is  once  found. 

It  is  clear  then  that  there  is  an  equivalence  between  the  optimal  policy  and  the  solution 
of  the  functional  equation.  We  shall  subsequently  discuss  this  in  more  detail. 

Problem  1.9.  Let  us  set,  in  similar  fashion, 

/(x)  =  over-all  probability  of  success  using  an  optimal  procedure.  (1.10) 

If  we  use  an  amount  y  on  the  first  try,  our  probability  of  success  is  a(y).  If  we  fail  on 
the  first  try,  an  occurrence  with  probability  [1  —  rf(y)],  we  use  an  optimal  policy  starting 
with  the  residual  amount  x  —  y.  Hence,  /(x)  satisfies  the  relation 

/(x)  =  Max  (a(y)  +  [1  -  *(y)]/(x  -  y)}.  (1.11) 

The  problem  is  much  simpler  mathematically  if  we  consider  the  probability  of  failure 
rather  than  the  probability  of  success. 

Problem  1.10.  Let 

/(x)  =  expected  time,  using  an  optimal  procedure,  to  transform  the  particle 
into  state  0,  if  the  probability  that  it  is  initially  in  state  1  is  x. 

(1.12) 

If  we  observe  the  system,  we  find  it  in  state  1  with  probability  x  and  continue  with  that 
knowledge;  whereas,  if  we  find  it  in  state  0,  the  process  terminates.  Hence,  if  jL(x ) 
denotes  the  expected  time  spent  if  we  observe  on  the  first  move,  we  have 

//-(*)  =  1  +  x/(l).  (1.13) 

On  the  other  hand,  if  we  act,  we  have 

/a(x)  =  1  +  f(ax) .  (1.14) 

Combining  these  two  results,  we  see  that 

ri  +  x/(l)1 

/(x)  =  M,n[l+/(rtx)J,  0<x<l, 

/( 0)  =  0.  (1.15) 

Problem  1.11.  Let 

/(«)  =  probability  of  obtaining  at  least  n,  using  an  optimal  procedure.  (1.16) 
Enumerating  the  possibilities  relative  to  each  choice,  we  obtain 
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t,  .  -»r  fft/C*  “  O  +  />*/(»-  2)1 

/(»)  -  Max  |  ^/(w  _  1}  +  ^(ff  _  2)  }’  »  ^  2-  (1,17) 

The  reasoning  behind  this  equation  is  as  follows:  If  one  obtains  k  on  the  first  step, 
one  continues  so  as  to  maximize  the  probability  of  obtaining  at  least  «  —  k  on  the  follow¬ 
ing  steps.  For  n  —  1,  we  have  the  same  equation  with  the  convention  that  /(  —k)  =  1, 

*  >  o. 

Problem  1.12.  Let 

fix,  y)  —  expected  amount  of  gold  obtained  using  an  optimal  sequence  of 

choices.  (1.18) 

If  choice  A  is  made,  we  have 

fAx,y)  ==  pi{fix  +  /[(i  —  *x  )x,y~\};  (119) 

whereas,  if  choice  B  is  made,  we  obtain 

/*(*>  y )  =  pzUiy  +  f\_x,  (i  -  jOj1]}.  (1-20) 


where  rx  —  r/ 100,  =  j/100. 

Hence, 


,,  x  _  xr  rA:  +  r^x’  ^  >1  ^  _ 

f(x,  y )  -  Max  ^  p^y  +  /[x>  (J  _  ^  }  J,  x,  y  >  0. 


(1.21) 


Problem  1.12'.  Let  us  consider  the  same  situation  in  which  it  is  desired  to  maximize 
not  the  expected  value  of  the  total  return,  R,  but  the  expected  value  of  «j»(R),  where  4> 
is  a  given  function. 

In  this  case  it  is  necessary  to  introduce  another  state  variable,  namely  z,  the  amount 
already  mined.  If  we  set 


f(x,  y,  z)  =  exp  starting  with  an  amount  z,  using  an  optimal  policy, 

(1.22) 

we  obtain  for  /  the  functional  equation 

fix,  y,  z)  -  Max  ^  z  +  Jjr]  +  (i  _  ?f)*(z)  J>  *’ 7  ^  °’ 


/(o,  o,  z)  =<M>). 
Problem  1.13.  Let 


(123) 


fix,  y )  =  expected  amount  of  gold  obtained  using  an  optimal  sequence  of 
choices  when  A  has  expected  amount  x  and  B  has  expected 
amount  y.  (1-24) 


If  choice  A  is  made,  we  have 


fAx,  y)  =  pi{rix  +  f[i  1  -  rx)x,  y]}, 


(1.25) 


while  choice  B  yields 
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/b(x,  y)  =  P*{sxy  +  fix,  (1  -  !,)>]}• 


(1.26) 


Hence, 


f(x,y)  Jte[jj:Mjiy+/[Jfi(1_ii);]}J.  x,y>0.  (1.27) 

We  see,  as  noted  above,  that  the  two  problems,  1.12  and  1.13,  yield  the  same  func¬ 
tional  equation,  although  quite  different  in  structure.  This  will  only  be  true  in  the 
simplest  versions  of  Problem  1.12. 

Problem  1.14.  Since  the  total  amount  of  money  in  the  game  remains  constant,  c,  it  is 
sufficient  to  specify  the  amount  of  money,  x,  held  by  the  first  player.  Let 

/(x)  =  probability  that  B  is  ruined  before  A  when  A  has  x  and  B  has  c  —  x 

and  when  both  sides  use  optimal  strategies.  (1.28) 

If  A  uses  the  strategy  p  =  (/>,,  p2),  where  p,  and  p2  denote,  respectively,  the  frequencies 
with  which  the  first  and  second  rows  of  M  are  played,  and  if  B  uses  the  strategy 
1  =  (fi»  ^a).  the  frequencies  with  which  B  chooses  the  columns,  we  obtain,  for 
0  <  x  <  c, 

/<*)  =  M i/(*  +  1)  +  Ptf*fix  “  1)  +  />*?./(*  -  2)  4-  p2?2/(x  -I-  2). 

(1.29) 

Let  us  denote  the  right-hand  side  of  this  equation  by  T[p,  q,  /(x)].  If  both  sides  play 
optimally,  we  have 

/(x)  —  Max  Min  T[p,  q,  /(x)]  =  Min  Max  T[p,  q,  /(x)]  ,  0  <  x  <  r, 

VI  IV 


/( 0)=0,  x  <  0  , 

/(x)  =  1,  x>  c. 


(1.30) 


This  is  equivalent  to  saying  that  /(x)  is  the  value  of  the  game  whose  payoff  matrix  is 


(fix  +  1)  fix  -  1)' 

v(x  2)  /(x+2). 


(1.27) 


1.7.  Discussion 


It  is  important  to  observe  that  in  all  these  problems  the  functions — the  solutions  of 
the  functional  equations — are  essentially  secondary,  since  the  optima!  procedures  are  the 
items  of  primary  interest.  Actually  the  two  are  equivalent,  since  a  procedure  defines  a 
function,  and,  conversely,  a  solution  of  the  functional  equation  defines  a  procedure.  Fre¬ 
quently  the  procedure  is  quite  easy  to  describe,  whereas  the  function  is  quite  complicated. 
From  the  point  of  view  of  application,  the  function  yields  little  or  no  immediate  informa¬ 
tion  as  to  the  structure  of  an  optimal  procedure,  whereas  the  individual  steps  in  the 
process  may  illustrate  some  valuable  principles  that  may  be  applied  in  heuristic  fashion  to 
the  more  complicated  problems  which  frequently  and  almost  maliciously  defy  exact 
analysis. 


FUNDAMENTAL  CONCEPTS 


9 


The  plan  of  this  study  is  first  to  formulate  some  general  mathematical  problems  of  the 
type  discussed  above  with  the  concomitant  existence  and  uniqueness  theorems,  and 
then  to  discuss  a  number  of  interesting  simple  representatives  of  the  general  problem. 

It  is  not  too  difficult  to  subsume  the  problems  we  discuss  under  a  more  abstract  frame¬ 
work.  However,  it  is  important  to  postpone  this  inundation  until  a  large  number  of  indi¬ 
vidual  problems  have  been  formulated  and  solved,  since  certain  indigenous  features  of 
each  problem  in  its  native  setting  will  facilitate  its  solution.  It  is  further  important  that 
we  consider  problems  that  arise  naturally  in  the  external  world,  since,  in  general,  it  is  only 
these  that  we  can  expect  to  possess  solutions  with  simple  and  easily  discernible  structures. 
Particularly  in  a  theory  involving  non-linear  functional  equations  are  the  signposts  of 
nature  most  valuable  in  preventing  us  from  wandering  desolately  in  the  trackless  wilder¬ 
ness  of  existence  and  uniqueness  theorems. 

1.8.  General  Mathematical  Formulation 

Let  p  be  a  point  in  an  abstract  space,  a  "phase  space”;  let  /(p)  be  a  function  of  p, 
whose  values  lie  in  another  abstract  space;  and  let  Tk  be  a  set  of  operators  applicable  to  /. 
The  general  class  of  functional  equations  in  which  we  are  interested  has  the  form 

HP)  =Max[r*(/)].  (1.31) 

K 

Minimization  problems  are  included  in  this  formulation,  since  they  may  be  converted  into 
maximization  problems  by  a  simple  change  of  sign.  The  index  k  may  run  over  a  finite, 
infinite  denumerable,  or  non-denumerable  set. 

The  simplest  examples  of  such  operators  are  furnished  by  the  class 

nc/)  =  g*(p)  +  ^  t>ki(p)K*kip) .  (i  32) 

where  p  is  a  point  in  »-dimensional  space,  gk(p),  are  scalar  functions,  and  skt  is  a 

point  transformation.  Examples  of  equations  connected  with  such  transformations  are 
(1.7),  (1.11),  (1.15),  (1.17),  (1.21),  and  (1.27).  We  shall  consider  only  equations  of 
this  type  in  this  study,  except  in  Chapter  4,  where  some  simple  integral  operators  appear. 

Problems  leading  to  equations  of  this  type  arise  from  both  deterministic  and 
stochastic  models,  with  slight  differences  of  form,  as  we  can  see  upon  comparing  Eq. 
(1.7),  which  arises  from  a  deterministic  model,  with  Eq.  (1.11),  which  arises  from  a 
stochastic  model.  These  slight  differences  force  us  to  use  different  techniques  in  estab¬ 
lishing  existence  and  uniqueness  theorems. 

In  this  section,  devoted  to  a  mathematical  formulation  of  the  problems  occurring  in 
one  phase  of  dynamic  programming,  we  shall  discuss  various  types  of  general  problems 
that  lead  to  the  diverse  classes  of  functional  equations  we  shall  consider  in  the  remainder 
of  the  report. 

1.8.1.  Deterministic  Investment  Problems.  At  each  stage  of  a  sequence  of  opera¬ 
tions  we  are  permitted  to  divide  our  resources,  of  total  amount  x,  into  k  parts 
*1,  x2,  •  ■  • ,  xk,  where  xt  >  0  and  x2  4-  x2  4-  —  +  xk  =  x.  The  return  from  this  parti¬ 
tion  is  given  by  a(xu  x2,  •  •  • ,  xk),  and  a  total  quantity  b(xu  x2,  ■  •  • ,  xk)  will  be  available 
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to  continue,  repeating  the  process.  If  we  define 

/(x)  =  total  return  obtained  using  an  optimal  policy,  with  an  unlimited 

number  of  operations,  (1.33) 

we  obtain  the  functional  equation 

/(x)  =  Max  {<*(xx,  x2,  •  •  • ,  xB)  +  /[i(x„  x2,  •  •  • ,  xB)]  } ,  (1 .34) 

*< 

where  the  xt  are  subjected  to  the  conditions  x»  >  0j,  xx  +  x2  +  ■  •  ■  4-  x„  —  x.  The 
boundary  condition  is  /( 0)  =  0,  assuming,  naturally,  that  a(0)  =  0,  b( 0)  =  0. 

1.8.2.  Stochastic  Investment  Problems — The  Gold-mining  Problem.  There  are 
n  sources  of  profit  having  respective  total  yields  x„  x2,  •  •  • ,  xB.  We  are  allowed  an 
unbounded  number  of  operations  and  a  choice  of  one  of  a  set  of  possible  actions  on  each 
operation.  Associated  with  the  kth  operation  there  is  a  distribution  function  of  returns: 

pik  =  probability  of  a  return  of  ^  ^  dijkXj,  (1-35) 

7 

where  we  assume  that  2»  put  <  cx  <  1  for  all  k.  This  means  that  associated  with  each 
choice  of  an  action  there  is  a  non-zero  probability  of  not  being  able  to  continue  the 
sequence  of  operations.  If  this  /fth  operation  yields  a  return  of  2  a\skxb  the  remaining 
total  yields  are  now  x>(l  —  aijk'),  j  =  1,  2,  •  •  • ,  n.  Hence,  if  we  set 


/(x„  x2,  •  •  • ,  x„)  =  expected  yield  employing  an  optimal  policy, 
we  obtain  for  /  the  functional  equation 


/(*i.  *2,  •  •  •  ,  *»)  =  Max 


P*  a',ikX> 


f\_x l(l 

The  simplest  example  of  this  is  furnished  by  the  equation 

u  +/CC1  ~  0*i.**])*l 

/(xlf  x.)  Max  +  /[Xi>  (1  _  ri)Xx]  jJ- 
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(1.36) 


(1-37) 


(1.38) 


Here  pt  is  the  probability  of  receiving  r{x{  and  being  allowed  to  continue  the  operations. 

1.8.3.  A  Testing  Problem.  A  system  is  known  to  be  in  some  one  of  N  +  1  different 
states,  which  we  denote  by  0,  1,  2,  •  -  • ,  N,  with  an  initial  probability  {pk}  that  it  is  in 
the  kth  state.  By  means  of  the  following  operations,  we  wish  to  transform  it  into  a  given 
state,  which  may  as  well  be  0,  with  the  certainty  that  it  is  there  in  a  minimum  time: 

L :  We  observe  the  actual  state  of  the  system  and  proceed  with  that  knowledge. 
This  requires  a  time  tL. 

A:  We  perform  an  operation  Ai  that  converts  the  original  probability  distribution 
{/>*}  into  a  new  distribution  {pki}.  This  operation  consumes  a  time  t j. 
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Let  us  set 


~Po" 

pol 

"O' 

pi 

II 

pll 

xk  - 

1 

0 

_px_ 

_Pxl_ 

.0- 

where  the  "l”  occurs  in  the  ith  place  and  define 


k  =  0,1,2,-“, N,  (1.39) 


/(/>)  —  /(/»i,  p2,  •  •  •  ,  pn )  =  expected  time  required  to  have  the  system  in 

state  0  with  certainty,  using  an  optimal  policy. 

(1.40) 


Then  /  satisfies  the  equation 


HP)  =  Min 


/(*„)  =  0. 


L-  h.  +  ^  Pkf(xk) 

A,:  I,  +  f(T tp) 


P=£x  o. 


(1.41) 


There  is  a  natural  discontinuity  at  xa,  since  for  p  x„,  no  matter  how  close  it  may  be, 
we  must  look  or  act,  either  of  which  consumes  a  certain  non-zero  time. 

1.8.4.  A  Production  Problem.  Let  us  suppose  that  we  are  given  initial  amounts 
*i,  x2,  ■  ■  • ,  x„  of  substances  A„  A2,  •  •  •  ,  A„  with  the  knowledge  that  at  each  stage  of  a 
sequence  of  operations  each  substance  may  be  used  to  produce  both  more  of  itself  and 
more  of  the  other  substances.  If  it  is  desired  to  maximize  the  amount  of  one  given 
substance  we  possess  at  the  end  of  a  fixed  number  of  stages,  a  question  arises  as  to  the 
allocation  of  resources  at  each  stage. 

Let  us  consider  a  simple  case  in  which  there  are  only  two  substances,  A  and  B,  and  in 
which  a  quantity  x  of  A  yields  c,x  of  A,  if  used  to  produce  A ,  and  c2x  of  B,  if  used  to 
produce  B.  Similarly  y  of  B  produces  dxy  of  A  and  d2y  of  B.  Assuming  that  at  each 
stage  we  are  allowed  only  these  operations: 


T  A->A  T  A  A  T  A-*B  T  A-^B 
’■  B^  A'  2'  B->B’  B  A’  *'  B^B’ 


(1.42) 


we  see  that  if  the  initial  amounts  of  A  and  B  are  x  and  y,  respectively,  at  the  end  of  the 
first  stage  the  results  of  the  various  operations  will  be 


W:*)- 

■CMtO- 


O =(;:;)• 
<)=UJ- 


(1-43) 
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The  effects  of  various  choices  Tlt  T2,  Ta,  Tt  are  equivalent  to  matrix  operations: 


(1.44) 


If,  given  the  initial  vector  x0  whose  components  are  x  and  y,  we  wish  to  maximize 
the  final  amount  of  A,  the  problem  is  that  of  choosing  the  sequence  of  matrices 
so  that  the  x-component  of 


(1.45) 


will  be  a  maximum. 

In  general  we  may  wish  to  maximize  some  linear  combination  of  the  final  amounts  of 
the  various  commodities,  which  is  to  say,  the  inner  product  of  x  with  a  fixed  vector 
c,  (x,  c).  If  we  define 


<£„(x)  =  (x„,  <r)  =  value  of  the  inner  product  obtained  using  an  optimal 

w-stage  procedure,  ( 1 .46) 

we  obtain  the  functional  equation 

<£»(*)  =  Max  <t>n.x(T*x,  c),  n>  1.  (1-47) 

k 

This  problem  is  complicated  by  a  lack  of  invariance  in  time,  i.e.,  n  — »  n  —  1.  If  an 
invariant  formulation  is  desired,  we  may  be  able  to  obtain  this  in  certain  cases  by  using 
the  following  device:  We  may  suppose  that  we  are  performing  these  operations  in  order 
to  meet  some  contingency  that  has  a  certain  probability  of  occurring  between  stages  of  the 
sequence  of  operations. 

Let 


</>(x)  =  «£(x,,  x2)  =  probability  that  the  contingency  can  be  successfully  met 
with  current  quantities  x1  of  A  and  x2  of  B. 

f(x)  =  f(xlt  x2)  —  probability  that  the  contingency  can  be  successfully  met 
whenever  it  occurs,  given  initial  amounts  of  x1  of  A  and 
x2  of  B  and  using  an  optimal  allocation  policy. 

p  —  probability  that  the  contingency  occurs  between  two 

stages.  (1-48) 


Then 


/(x)  =  p<Kx)  +  (1  -  p)  Max/(7».  (1.49) 

k 

1.8.5.  An  Investment  Problem.  In  the  previous  problems  we  have  been  dealing 
with  expected  values  and  the  maximization  or  minimization  of  these  values.  Frequently, 
however,  the  actual  purpose  of  a  program  is  not  so  much  to  maximize  the  expected  value 
of  a  critical  variable  as  it  is  to  maximize  the  probability  that  this  variable  is  above  a  certain 
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level.  Consequently,  it  is  of  interest  to  compare  the  optimal  strategy  derived  from  using 
the  expected-value  criterion  with  the  strategy  corresponding  to  the  more  realistic  criterion. 

There  is  an  additional  reason  for  studying  alternative  criteria.  It  is  possible  that  the 
strategy  which  yields  the  maximum  expected  value  will  have  an  undesirably  large  variance 
associated  with  it  which  makes  its  use  quite  risky.  On  the  other  hand,  it  is  to  be  expected 
that  the  strategy  associated  with  the  criterion  of  maximum  probability  will  automatically 
possess  a  smaller  variance. 

Let  us  consider  a  situation  in  which  there  is  an  infinite  sequence  of  operations  to  be 
performed.  At  each  stage  we  have  a  choice  of  one  of  the  operations  Alt  A2,  •  •  • ,  Ak.  If  the 
/th  operation  is  chosen,  there  is  a  probability  pit,  1  <  ;  <  r,  of  receiving  an  amount 
with  pit  <  1  for  all  i,  and  a  probability  of  terminating  the  sequence  of  operations 
equal  to  the  remaining  probability  1  —  2L,  Pa- 

If  we  define 

/(»)  =  probability  of  obtaining  a  return  greater  than  or  equal  to  n,  (1-50) 
then  clearly  /(»)  satisfies  the  equation 

/(«)  =  Maxj^T^pjj/Cw  -  /)!,  (1.51) 

since,  if  one  obtains  /  on  the  first  operation,  one  continues  so  as  to  maximize  the  proba¬ 
bility  of  obtaining  at  least  n  —  j. 

This  is  the  general  case  of  Problem  1.8.4,  discussed  previously. 

1.8.6.  An  Optimal  Inventory  Problem.  Let  us  assume  that  we  have  a  quantity,  x, 
of  merchandise  on  hand,  and  that  there  is  a  probability  <f>(y )  dy  that  at  some  specified 
time  we  shall  be  called  on  to  deliver  a  quantity  y  of  this  merchandise.  To  meet  this 
potential  demand,  we  may  order  an  additional  quantity,  z,  of  merchandise  at  a  cost  of 
g(z).  If  the  demand,  y,  exceeds  the  total  quantity,  x  +  z,  the  request  for  merchandise  is 
satisfied  as  far  as  possible,  and  a  penalty  cost  of  Af  is  levied.  Assuming  that  this  situation 
repeats  itself  indefinitely,  and  that  future  costs  are  discounted  at  a  fixed  rate,  a,  deter¬ 
mine  the  ordering  policy  which  minimizes  the  over-all  expected  cost. 

Let 


/(x)  =  expected  total  cost  using  an  optimal  ordering  policy. 
If  z  is  ordered  initially,  the  total  expected  cost  will  be 

r(z>/)  -  g(z)  4-  rf|[M  +  *(°)]  *0)  dy 

XX+Z 

f(x  +  z  -  y)<j>(y)  dy 


(1.52) 


(1.53) 


Since  z  is  to  be  chosen  to  minimize  the  total  cost,  we  have,  for  our  functional  equation, 

/(x)  =  Inf  [T(z,/)].  (1.54) 

Z>  0 
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CHAPTER  2 

EXISTENCE  AND  UNIQUENESS  THEOREMS 


2.1.  Introduction 


In  this  chapter  we  shall  discuss  the  questions  of  the  existence  and  uniqueness  of  the 
solutions  of  the  various  functional  equations  formulated  in  response  to  the  problems 
posed  in  Chapter  1. 

We  shall  first  show  how  that  general  factotum — the  method  of  successive  approxima¬ 
tions — yields,  under  assumptions  that  are  natural  to  the  problem,  existence  and  uniqueness 
theorems  together  with  information  concerning  the  dependence  of  the  solution  on  the 
variables  and  parameters  in  the  equation. 

Using  these  facts,  we  shall  turn  to  a  discussion  of  the  rigorous  concept  of  a  solution 
and  then  to  various  questions  concerning  computational  and  approximate  methods.  This 
last  is  of  great  practical  importance,  since  the  non-linearity  of  the  equations  reduces  the 
number  that  may  be  resolved  purely  by  analytic  means  to  a  woeful  handful. 


2.2.  The  Equation  Up)  =  Max  [gdpl  +  hfclplflTfrpl] 

In  Section  1 .6  we  encountered  the  equation 


f(.x,y )  =  Max 


+  /[(*  -  'i)*.  rin 

Lhi^y  +  /[*.  O  - 


in  connection  with  a  gold-mining  problem. 

This  is  a  special  case  of  the  more  general  equation 


f(p)  =  Max 

1  <kCn 


gkCp) 


M 

1-1  _ 


where  p  is  a  point  in  N-dimensional  space,  EK,  and  T ikp  is  a  transformation  taking  p  into 
another  point  in  Es.  To  simplify  the  notation  we  shall  assume  throughout  that  M  =  1. 
It  will  be  quickly  seen  that  this  is  no  essential  restriction. 

Our  first  result  is 

Theorem  2.1.  Consider  the  equation 


f(p)  =  Max  [**(/>)  +  hk(p)f(Tkp)1,  (2.1) 

i  <k<n 

where  we  assume  that 

(a)  The  point  p  is  restricted  to  a  region  R  with  the  property 

that  p  €  R  implies  that  Tkp  e  R, 
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O)  °<gk(P)<c  i  forpeR, 

(c)  0  <  hk(p)  <  c2  <  1  forpeR.  (2.2) 

Under  these  conditions  there  is  a  unique  bounded  solution  to  (2.1). 

As  we  shall  see  below,  conditions  (2.2b)  and  (2.2 c)  could  be  replaced  by  |  gk  |  <  clt 
I  At  |  <  c 2  <  1  without  affecting  the  validity  of  the  final  result.  In  most  applications, 
however,  (2.2b)  and  (2.2c)  will  be  realized,  since  hk  represents  a  probability  and  gk  an 
expected  gain.  Our  first  application  of  successive  approximations  will  rely  heavily  upon 
(2.2b)  and  (2.2c). 

There  are  several  ways  of  applying  the  method  of  successive  approximations  that  are 
distinct  not  only  analytically,  but  also  conceptually. 

The  first  takes  its  origin  in  the  viewpoint  that  an  infinite  process  is  only  sensibly 
defined  as  a  limit  of  a  finite  process.  We  consider,  then,  that  at  first  we  are  allowed 
only  n  stages.  If  we  define 


/„(/>)  =  return  obtained  using  an  optimal  policy  when  at  most  n 

stages  are  allowed,  (2.3) 

we  obtain  the  recurrence  relations 

/„(/>)  =  Max  [g*(p)], 
k 

fn+i(P)  =  Max  [gk(p)  +  bk(P)fn(Tkp)~\,  n  =  0,  1,  2,  •  ••.  (2.4) 

k 

•  Let  us  show  that  the  sequence  (/„(p)}  converges  to  a  solution  of  (2.1).  Since 

gk,  hk  >  0,  it  is  clear  that  /,(p)  >  f0(p)  for  all  p  in  R.  From  this  it  follows  inductively 

that  /„»,  >/„>•••/,>/„>  0.  If  we  set  un  =  Sup* /„(/>),  we  obtain  from  (2.4), 

using  (2.2a), 

1  <  +  c2un,  (2.5) 

which  shows  that  un  <  c,/(l  —  c2),  n  —  0, 1,  2,  •  •  •. 

It  follows,  then,  that  for  each  p  c  R,  the  sequence  {/„(/’))  converges  to  a  function  that 
we  call  f(p).  It  remains  to  demonstrate  that  /(p)  is  actually  a  solution  of  (2.1). 

Turning  to  (2.4),  we  see  that  the  monotone  character  of  /„  yields 


/. 

1+1(p)  <  Max  [g*(p)  +  hk(p)f(Tkp)1, 

k 

(2.6) 

whence 

HP)  <  Max  [g*(p)  +  ^(p)/(rkp)]. 

(2.7) 

Similarly,  from  (2.4), 

r 

<> 

/(P)  >  Max  [^(p)  +  ^(p)/,(Tlkp)] , 

(2.8) 

whence 
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HP)  >  Max  IgHP)  +  h(P)HTkp)-] .  (2.9) 

Comparing  (2.7)  and  (2.9),  we  see  that  we  must  have  equality. 

Let  us  now  demonstrate  the  uniqueness  of  this  solution,  f(p).  Let  F(p)  be  another 
bounded  solution  satisfying  (2.1): 


F(p)  =  Max  [ik(p)  +  Mp)F(Tkp)] .  (2.10) 

k 

Let  Jk  =  k(p)  be  an  index  which  yields  the  maximum  of  gk(p)  +  bk(p)f(Tkp),  and  let 
m  =  m(p )  be  a  corresponding  index  for  gk(p)  +  hk(p)F(Tkp).  Then  by  virtue  of  the 
maximum  property  we  have 


HP)  =  gHp)  +  *>k(p)f(Tkp)  >  gm(p )  4-  hm(p)f(Tmp), 
F(P)  =  gm(P)  +  km(p)F(Tmp)  >  gk{p)  +  hlc(p)F(Tkp) . 


Hence, 


HP)  ~  HP)  >  hm{p)U(Tmp)  ~  F(Tmp)l 
<  bHPHHXkP)  -  F(Tkp)l, 


which  yields  the  result 


If  we  define 


|  HP)  ~  F(p)  |  <  Max 


hm(p)  |  f(Tmp)  -  F(TmP) 
bk{p)  |  HTkp)  ~  F{Tkp )  | 


S  =  Sup  |  /(/>)  -  F(p )  |, 

R 


(2.11) 


(2.12) 


(2-13) 


(2.14) 


we  have  for  a  p  for  which  |  f(p)  —  F(.p)  |  >  S  —  e,  e  small,  from  (2.13), 

5  -  e  <  c2S.  (2.15) 

Since  c2  <  1,  this  leads  to  a  contradiction  for  e  sufficiently  small,  unless  S  =  0.  This  com¬ 
pletes  the  proof  of  the  uniqueness. 

The  second  application  of  successive  approximations  proceeds  upon  the  basis  that  the 
physical  origin  of  the  equation  is  of  no  interest.  We  choose,  consequently,  an  arbitrary 
non-negative  function,  /<,(?) >  uniformly  bounded  over  R,  as  our  first  approximation. 

The  recurrence  relation  is  now 


/»«(/>)  =  Max  [gk{p)  +  bk{p)fn(Tkp')'\,  n  =  0,  1,  2,  •  •  ■ .  (2.16) 

k 


To  show  that  the  sequence  {/»(/?)  }  converges,  we  use  the  device  of  (2.11),  above.  Let 
k  —  kip)  denote  an  index  that  furnishes  the  maximum  for  /*_lt  and  let  m  =  m(p )  de¬ 
note  a  corresponding  index  for  Proceeding  as  in  (2.12),  we  obtain,  for  n  >  1, 


\UHP)  -/»(/>)!<  Max 


\C2\fn{Tmp)  -  fn-HTmp)  I  1 

Wl MTkp)  -  \\ 


(2.17) 
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If  we  define 

un  =  Sup  |/»(p)  -  /«-,(p)  | ,  »  =  1,  2,  •••,  (2.18) 

R 

we  obtain,  from  (2.17),  un+1  <  c^tK.  Hence,  if  c,  =  Sup ft/0(p),  we  obtain  finally  the 
inequality  <  rjJ+V,.  This  shows  that  the  series 

00 

</■«  -  /«)  (219> 

n -0 

converges  uniformly  in  R,  and  thus  that  the  sequence  {/„}  converges  uniformly  in  R  to  a 
function  /(p). 

It  follows  from  this  that  /(p)  will  be  continuous  if  each  /„(p)  is  a  continuous  func¬ 
tion  of  p.  This  will  be  true  if  g*(p)  and  hk (p)  are  continuous  in  p,  and  if  /0(p)  is  chosen 
to  be  continuous. 

We  have  thus  demonstrated 
Theorem  2.2.  Under  the  conditions 

(a)  £*(P)  /J  a  continuous  function  of  p  in  R, 

(b)  hk(p)  is  a  continuous  function  of  p  in  R,  and  |  hk(p)  |  <  c 2  <  1 ,  (2.20) 

the  solution  of  (2.1)  is  a  continuous  function  of  p. 

Furthermore ,  if  gk(p )  <*nd  hk(p)  are  continuous  functions  of  a  set  of  parameters,  q, 
f(p)  will  be  a  continuous  function  of  these  parameters. 

In  Section  2.7,  below,  devoted  to  approximate  and  computational  techniques,  we  shall 
show  that  a  combination  of  the  above  two  ideas  can  be  used  in  many  cases  to  furnish  quite 
useful  initial  approximations. 

2.3.  The  Equation  fix)  =  Max  (a(x,,  x2f  •  •  • ,  x„J  +  f[b(x„  x2,  •  ■  • ,  x„)]} 

The  exigence  and  uniqueness  theorem  in  the  previous  section  does  not  apply  to 
the  equation 


/(x)  =  Max  {g(y)  +  h(x  -  y)  +  f[ay  +  b{x  -  >-)]},  (2.21) 

0<V<X 


encountered  in  Section  1.6  in  connection  with  an  investment  problem.  To  remedy  this,  we 
shall  prove 

Theorem  2.3-  Consider  the  equation 

f(x)  =  Max  {<j(x„  x2,  •  •  -  ,  xn )  4-  /[£(*,.  x2,  •  •  • ,  x„)]},  (2.22) 

R 

where  R  —  R{x)  is  defined  by  xk  >  0,  ^  xfc  =  x. 

If 

(a)  rf(x„  x2,  •  •  • ,  xB)  is  continuous  over  R(x)  for  0  <  x  <  x„ 
and  non-negative,  a{0,  0,  •  •  • ,  0)  =  0, 
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(b)  b(x i,  x2,  •  •  • ,  xn)  is  continuous  and  non-negative  over  R,  and 


b(x i,  x2,  •  •  • ,  x„)  <  c  xk,  0  <  c  <  1 , 

1 

in  R(x0), 

(c)  2“0  hiclxo)  <  oo, 

A(x)  =  Max  [Max  rf(xlt  *2,  •  •  •  ,*„)],  (2-23) 

0  <*<*  S(») 

/Acre  //  unique  continuous  solution  to  (2.22)  for  which  /( 0)  =  0,  /or  0  <  x  <  x0. 

Proof.  Let  /0(x)  be  the  value  obtained  by  choosing  x2  —  x,  x2  =  xs  =  •  •  •  xn  =  0 
repeatedly.  Then 

/0(x)  -  a(x)  +  *[>(*)]  +  •  •  -,  (2.24) 

where  we  have  set  a(x)  —  a(x,  0,  •  •  • ,  0),  b(x)  =  b(x,  0,  •  •  • ,  0).  The  series  on  the 
right  is  majorized  by 


OO 


h(clx„). 


and  hence  converges  uniformly. 
Define 


fn+i  =  Max  {d(x„  x2,  •  •  •  ,x„)  + /„[^(x„x2,  •••,x„)]).  (2.25) 

R(x) 

From  the  definition  of  /0,  it  follows  that  /,  >  and  hence  that  /„+,  >  /B.  Let  Af„(x) 
=  Max  /„(y).  Then,  from  (2.25), 

o<y<* 

Mn+1(x)  <  A(x)  +  Af„(cx),  (2.26) 

whence  Af„(x)  <  2“0^(flx)-  Therefore,  /„(x)  converges  to  a  function  /(x)  for  all  x  in 
[0,  x„],  which,  as  above,  is  readily  seen  to  be  a  solution. 

The  technique  utilized  above  is  readily  adapted  to  show  the  uniqueness  of  a  con¬ 
tinuous  solution,  /.  If  g  is  another  continuous  solution,  we  obtain,  for  a  pair  of  points 
(x„  x2,  •  •  • ,  x„),  (y„  y2,  •  •  • ,  yn)  which  yield  the  respective  maxima, 

/(x )  =  a(xly  x2,  •  •  • ,  xK)  +  f[b(xu  x2,  •  •  • ,  x*)] 

>  y2>  •  •  • .  y*)  +  /0(yi,  y2,  •  -  - ,  y*)] , 

g(x)  =  a(y„  y2,  ■  •  • ,  j„)  +  g[A(y„  y2,  ■  ■  - ,  y„)] 

>  <*(x„  x2,  •  •  • ,  x„)  +  g[b(xu  x2,  •  -  • ,  xw)]  ,  (2.27) 

whence 

|  /(x)  -  g(x)  |  <  Max  {|  /[6(x„  x2,  ■  •  • ,  xN)]  -  g[^(x„  x2,  •  •  • ,  x*)]  | , 

|/[*(yi. y2.---.yy)]  -  ^[*(yi.y2. ]|>- 


(2.28) 


*  * 
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Let 


M(x)  =  Max  \f(y)  —  g(y)|.  (2.29) 

0<V<X 

Then,  from  (2.28), 

■M(x)  <  M(cx )  <  Af(c"x),  n  =  1,  2,  •  •  -.  (2-30) 

Since  M(x)  is  continuous  and  Af(0)  =  0,  we  obtain  M(x)  <  0,'which  means  that  Af(x) 
is  identically  zero.  This  completes  the  proof. 

Let  us  note  that  all  we  have  actually  proved  at  this  point  is  that  there  is  a  unique 
bounded  solution  to 

/(*)  =  Sup  {d(xlt  x2,  •  •  ’ ,  *»)  +  /0(*i.  •  •  • .  *»)]}. 

R 

/( 0)=0,  (2.31) 

since,  at  the  moment,  we  have  no  assurance  that  the  maximum  is  assumed.  The  simplest 
way  to  ensure  that  the  maximum  is  assumed  is  to  prove  that  /  itself  is  continuous.  This 
fact  and  the  corresponding  results  concerning  continuity  as  a  function  of  parameters  may 
be  readily  derived  by  using  the  modification  of  the  method  in  Section  2.2,  Eqs.  (2.17) 
through  (2.19),  given  above. 


2.4.  The  Equation  f(p)  =  Min  [1  +  2"=u  1  +  fffipl] 

Let  us  now  turn  our  attention  to  the  more  complicated  functional  equation  involved  in 
the  testing  problem  discussed  in  Section  1.8.3: 

f(p)  =  Min  jl  +  g  />*/(**),  Min  [1  +  /(7»] 

/(*o)  =  0,  (2.32) 

where  l  runs  over  the  set  /  =  1,  2,  •  •  • ,  M.  Here,  we  set 


po 

pol 

p  = 

pi 

Tip  = 

pll 

Jn_ 

_ P «!_ 

(2  33) 


where  =  p*j(p0,  pi,  •  •  • ,  pn),  the  "1”  occurring  in  the  £th  place,  and  take  f(p)  to  be 
a  scalar  function  of  p.  That  f(x0)  =  0  is  a  consequence  of  the  fact  that  x„  is  the  desired 
state  and  that  no  action  is  taken  when  p  =  x0. 

We  shall  prove 

Theorem  2.4.  If  for  each  transformation  T (  it  is  true  that 
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Y  pkl  <  Cl  pk,  0  <  cx  < 


(2  M) 


for  till  p  inch  that  2jJ.0  pk  =  1,  p*  >  0,  then  there  exists  a  unique  bounded  positive  solu 
lion  to  lit/.  (2.32). 

PROOF,  We  shall  once  more  employ  the  method  of  successive  approximations.  Consider 
the  procedure  S,  represented  by  LTXLTX  •  •  • ,  which  means  that  wc  look,  then  act,  If  p  Is 
not  ,v„;  look,  act;  and  so  on,  repeatedly.  Similarly,  define  S2  by  T,LT,L  •  •  •  .  It  Is  clear 
from  .simple  considerations  of  the  probability  theory  that  the  expected  times  F,(p),  P*(p) 
required  to  transform  p  into  x0  with  certainty  are  finite  for  both  strategics. 

To  calculate  F,(p)  —  f*xip)  and  F2{p )  =  /«,(/*)>  we  employ  the  equations 


l\(P)  =  1  d-  Fa(7». 


FAP)  = 


'  +  t 


ptiF,  (xk),  p^x„. 


(23)) 


t  locwt\ 


Ft(.vt)  =  2  +-  >  ^  F  (x4),  /  =  b  2, 


(2  3«) 


NtKv*  Cv  \  ( />v  (  v  )  <  c  <  t .  the  determinant  ot  the  system  does.  nett  vlfikh  »f>4  thf< 

$ysfrf««t  bos  i  troMcptsr  sohcfeoiv  rtecessartV  posatrse;  as  *r  see  by  vyhmp  lterutmty  f 

vk-Cvecmcted  F  we  netdS&r  determine  F  (py  snd  F,( p } 

Now  detune 


~,( »>  =  36rr  F  (p%  FJpy\, 


Aw4#->  =  SCrr  L  51 

<fc4 

C  -  -,^T  fry 

n-*ibO  —a.  C2-?r> 

ciasoitSMiK.  the  reiaaest  ch-  tfe  eimncrg  ieasseirrz.  r  s  iesr  4**r  Cpty  <T 

nipK  'uhenkR  ,  ^  I.  tri  ^ivdiacE^^  'artvrrgeo  T<euvrse,>- 

-J»t)  tH--*.  utn.’Tvatr  ~(p-,  ^Kjxxt.  JSr-  *  the  ^crisMi  •rmt’,  nxr  ae  itesnrs  ■met ftv’  ut-  Stcsfce 
its.  it**.  TOiEit 

Tftt:  -tm'CueTto*  -sem  *-  axtser  tsjec:  nrrr.'i  icaai.  Ler  ana:  g  *jr  '  ss»s  ~r*rvrr«»> 

-cJsafnfrte-  at  Tr.  srsr**e  icst  .ererna_ 

■Stci-,  rffrt  —  =  Vta* 

Tbst  ovstttaury 

X**:  2LS*i=  rf{rs  — 2  3**1 

te  pa, 

Ttfc  jgwnsac  3e  ctcj.tc-  jramirr  -»e  rarsmerr  ■sue'rsmst 


22 


THE  THEORY  OP  DYNAMIC  PROGRAMMING 


c 


{ 


c 


(a) 

tip)  =  1  +  ^  • 

M 

giP)  =  1  +  ^  P^i*^  ’ 

(b) 

n 

tip)  =  1  +  ^  Pkfixt). 

giP)  =  1  +  giTip), 

(0 

tip )  =  1  +fiT,p), 

n 

gip)  =  1  +  ^  pkgixk). 

(d) 

tip)  =  1  +fiTlp), 

giP)  =  1  +  giTip)- 

Consider  first  the  case  corresponding  to  (a).  We  have 

n 

tip)  ~  gip)  =  ^  “  £(**)]> 

whence 


(2.39) 


(2.40) 


|  tip)  -  giP)  |  <  Max  |  f(xk)  -  £(**)  |.  (2.41) 

k 

Therefore,  for  all  p  for  which  (a)  holds,  the  assertion  of  the  lemma  is  correct.  The  equa¬ 
tions  of  (2.39a)  will  hold  whenever  p  is  close  enough  to  x0,  since  fip),  gip )  >  1  for  all 
p  ^  x0,  whereas  1  4-  2^,  pkfi*k)  and  1  +  2JU  Pkg(xk)  are  close  to  1  for  p  close  to  x0. 
Thus,  l  +  f(Ttp)  and  1  +  g(Ttp )  will  be  dominated  by  the  observation  moves  for  p 
close  to  x0. 

This  is  an  important  point,  since  the  crux  of  our  proof  is  the  fact  that  (2.39a)  will 
always  occur  after  a  finite  number  of  moves,  by  virtue  of  Eq.  (2.34). 

Now  consider  Eq  (2.39b).  We  have 


Hence, 


tip)  ~  1+  ^  Pkf(xk)  >  1  +  fiTtp), 

n 

gip)  =  i  +  *(7»  >  i  +  ^2  pkg(.xk). 


|  KP)  -  g(.P)  |  <  Max  CMax  |  /(**)  -  gi*k)  p 

Sup  |  fiTtp)  -  g(T,p)  |], 


(2.42) 


(2.43) 
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jr 


/ 


and  similarly  for  (2.39c).  For  (2.39d)  we  obtain 

Sup  |  HP)  -  g(p)  |  <  Max  [Sup  |  /(7»  -  g(TlP)  |, 

Sup|/(r,,p)  -  g(T,.p)  (].  (2.44) 

We  now  iterate  these  inequalities.  For  any  fixed  p,  Tj ,  •  -  • ,  TiJ>  will  be  in  the  region 

governed  by  (2.39a)  for  r  large  enough.  Consequently,  we  obtain 


Sup  |  HP)  ~  g{p)  |  <  Max  |  f(xk)  -  g(xfc)|.  (2.45) 

P  k 

This  completes  the  proof  of  the  lemma.  It  remains  to  show  that  Max*  |  /(x*)  —  g(xk)  I 
=  0.  Let  K  be  an  index  at  which  the  maximum  is  assumed.  It  follows  from  the  functional 
equation  for  f  and  g  that 

/(**)  =  1  +  HTtxK),  /  =  /(*), 

g(xK)  =  1  +  g(.Tt.xK),  P  =  l'(K),  (2.46) 

and  that 

H*k)  =  1  +  HT,xk)  >  1  +  f{Tl.xK), 

g{xK)  =  1  +  g(Ti.xK)  >  1  +  g(TtxK) .  (2.47) 

If  both  inequalities  are  proper,  we  obtain 

|/Ox)  -  £0*)  |  <  Max  [  |/0>x)  -  g(Ti*ic)  \, 

|  KTi-Xk)  ~  g(Tt.xK)  |  ]  <  Sup  |  HP)  -  g(P)  |.  (2.48) 

P 

which  contradicts  (2.45).  Thus,  for  /  or  /'  we  have 

/Ox)  =  1  +  /(?>*), 

g(.xK)  =  1  +  g(TiXK).  (2.49) 


This  means  that  the  first  moves  can  be  the  same. 

Consider  now  the  situation  for  second  moves.  Using  the  same  argument,  we  see  that 
the  second  moves,  i.e.,  the  equations  for  f{TtxK),  g(T ixK),  can  be  the  same,  and  so  on, 
by  induction. 

Let  p„  —  pn(.xK)  be  the  distribution  achieved  after  n  moves,  where  the  («  +  l)th 
move  puts  xK  into  the  region  governed  by  (2.39a).  The  same  argument  as  that  above 
shows  that  both  f  and  g  may  be  put  into  this  situation  at  the  same  move.  Then 


/(**) 

£(**) 


(»+!)+  ^  PknHxk), 

n 

(»+!)  +  y.  Pkng(.xk)  ■ 


(2.50) 
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Therefore, 


/(**)  -  $(**)  = 


k)  —  pkt 


[/(**>  -  *(**>]• 


By  assumption,  p0n  >  0.  Therefore, 


|/Olt)  ~  £(*V)  |  <  (1  —  Pon')  \f(*K)  ~  g(xK)  | 


(2.51) 


(2.52) 


implies  that  \f(xK)  —  g(xK)  |  =  0.  This  implies  that  f(p)  —  g(p)  ==  0,  and  completes 
our  proof. 

2.5.  Th«  Optimal  Inventory  Equation 

Let  us  now  treat  the  equation  that  occurs  in  connection  with  the  optimal  inventory 
problem  discussed  in  Section  1 .8.6, 

*(x)  =  Inf  g(y  -x)  +  a(u(0)  [1  -  F(y)] 

*>*  L 


+  JT”  ♦(*',  y )  dF{i>)  +  JU  u(y  -  v)  <fF(tO}J,  (2.53) 


where  we  shall  assume  that 


g(0)  =  0,  g(y)  >  0  fory>0, 

(b)  f  dF  =1,  dF  >  0, 

(c)  0  <  <»  <  1 , 

(d)  0<  J"  ^(v,  y)  <fF(t<)  <  c,  <  oo  forally>0.  (2.54) 

Under  these  assumptions  we  shall  prove 

Theorem  2.5.  There  is  a  unique  uniformly  bounded  solution  to  (2.53).* 

Proof.  We  shall,  as  before,  employ  the  method  of  successive  approximations.  Let 

*«(*)  =  ‘»|»o(0)[l  —  F(x)]  +  J  *(*>,  x)  dF(v)  +  J  u0(x  —  v)  dF(v) J, 

*.«W  =  Inf  ^(y  -  X)  +  rfj«„(0)[l  -  F(y)] 


+  /  y )  dF(v)  + 


(y-v)dF(v)  .  (2.55) 


*  This  result  is  contained,  along  with  a  multitude  of  other  results,  in  the  paper  of  Dvoretzky, 
Kiefer,  and  Wolfowitz  referred  to  in  tht  Preface.  The  method  of  proof  given  here  is,  however,  dif¬ 
ferent  from  theirs. 
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Setting  x  =  0,  we  obtain 

I *(",  0)  JP{p) ] 

*<°>  =  - (T^5 - J-  <««> 

Substituting  in  (2.55),  we  obtain  an  equation,  the  “renewal”  equation,  for  «0(r), 
which  may  be  readily  solved  by  Laplace  Transform  techniques  or  by  simple  iteration, 
since  0  <  a  <  1,  and  dF  =■  1,  dF  >  0. 

Referring  to  (2.55)  again,  we  see  that 

*,(*)  =  Inf  [g(y  -  x)  +  «0(>)]  •  (2-57) 

V>x 

Since 


Inf  [g(y  -  x)  +  *0(y)]  <  g(0)  +  »„(> f)  =  »„(*),  (2-58) 

tl>* 

we  obtain  the  important  result  that  //,(*)  <  «0(x).  From  this  it  is  immediately  clear, 
using  the  recurrence  relation  in  (2.55),  that  u2  <  uu  and  that,  generally, 

«o  >«»=>••>»»•••>  0.  (2-59) 

It  follows  that  for  all  x  >  0,  the  sequence  {w„(x)}  converges  to  a  uniformly  bounded 
function  »(x).  Using  (2.55)  again,  it  is  clear  that  »(x)  satisfies  (2.53). 

It  is  not  difficult  to  use  the  methods  of  the  previous  sections  to  show  that  the  con¬ 
vergence  is  actually  geometric,  i.e.,  |  »(x)  —  »„(x)  |  <  c2an,  for  some  r2  >  0. 

To  establish  uniqueness  we  proceed  as  before.  Let  us  assume  that  there  are  two  solu¬ 
tions,  u  and  tv,  both  bounded  in  any  finite  interval  [0,  x],  Let  y  =  y(x)  and  z  ~  z(x)  be 
two  decision  functions  that  yield  values  of  u(x)  and  «/(x),  respectively,  within  Cj  and  e2 
of  the  actual  infima,  where  €j  and  e2  are  small  positive  quantities. 

Then  we  have 

*(x)  =  T(u,  y)  4-  e,  <  T(u,  z )  +  €3, 

w(x)  =  T(w,  z)  +  e2  <  T(w,  y)  +  e4,  (2.60) 

where  e:,  and  €«  are  again  small  quantities,  and  T (»,  y)  is  an  abbreviation  for 

g(y-x)  +  <*j»(0)[l  -  F(y)]  +  £  *(v,y)dF(v) 

+  J'  u(y  —  v)  dF(v) J.  (2.61) 

We  have  then 

«(x)  —  w(x)  <  es  —  e2  +  T(w,  z)  —  T(u/,z ) 

>«!  —  ««+  a”(«,  y)  —  T(w,  y)  .  (2.62) 


Since 
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T(u,  z)  —  T(io,  z)  =  a  J **  [w(z  --  *<)  --  u>(z  —  »)]  dF(v) 

+  <*[«( 0)  —  -  F(z)],  (2.63) 

the  inequalities  in  (2.61)  yield 

|  #(x)  —  w(x )  |  <  Max  |  e„  +  a  J'  |  *(z  —  v)  —  w{z  —  v )  j  dF(v) 

+  a  |  »(0)  —  *0(0)  |  [1  —  F(z)] , 
e5  +  <a  |  «(>  —  v)  —  w(y  —  */)  |  dF(v) 


+  a  |  »(0)  -  w(0)  |  [1  -  F(y)] 


(2.64) 


Let  x  be  chosen  to  be  a  point  at  which  |  u(x)  —  w(x)  |  is  within  eB  of  its  supremuni,  d. 
Then,  since 


s: 


I  w(z  —  «')  —  tv(z  —  v)  I  dF(v)  +  |  *(0)  —  w(0)  I  [1  —  F(z)] 


dF(t>)  +d[  1  -  F(z)]  = 


we  obtain,  from  (2.63), 


—  c6  <  c5  +  ad. 


(2.65) 


(2.66) 


which  yields  d(l  —  a)  <  €a  -f-  €e.  Since  1  —  d  >  0  and  e6  and  e6  may  be  chosen  arbi¬ 
trarily  small,  d  must  necessarily  be  zero. 


2.6.  Definition  of  a  Solution 

We  have  shown  that  the  functional  equation  in  (2.1)  possesses  a  unique  bounded 
solution.  It  is  clear  that  this  function  defines  a  strategy  5,  since  the  first  choice  will  be 
Ak,  where  k  is  the  index  that  maximizes  gk(P)  +  bk(p)f(Tkp),  the  second  choice  being 
similarly  determined  by  the  expression  for  f(Tkp),  and  so  on. 

The  question  arises  as  to  whether  or  not  this  is  actually  an  optimal  strategy,  and,  if  so, 
as  to  whether  or  not  it  is  unique.  That  it  is  an  optimal  strategy  we  see  by  the  following 
argument:  Suppose  that  we  have  another  prescription  Sa  for  determining  an  optimal  yield. 
This  prescription  defines  a  function  <p(p)  that  must  satisfy  the  same  functional  equation, 
(2.1),  because  if  it  did  not,  it  would  not  possess  the  necessary  optimal  continuation 
policy.  Hence,  <p(p)  —  f{p)-  Since  S  yields  /(/>),  we  see  that  no  other  policy  Sa  is 
preferable. 

It  is  not  necessarily  true  that  S  is  unique.  This  arises  from  the  fact  that  for  various  p’s 
several  choices  may  be  equivalent,  although  the  continuations  from  equivalent  choices  will 
be  quite  different.  We  shall  subsequently  meet  a  very  simple  example  of  this.  We  observe, 
however,  that  the  functional  equation  permits  us  to  obtain  all  optimal  strategies. 
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Another  question  that  arises  is  one  as  to  the  precise  a  priori  definition  of  /(/).  The 

definitions  given  in  the  Introduction  are  loose,  since  it  is  not  dear  that  the  required 

optimal  procedures  exist. 

There  are  several  alternative  procedures,  corresponding  to  the  techniques  of  successive 
approximation  that  we  employed. 

We  may  define,  by  ukase,  f(p)  to  be  the  solution  of  our  functional  equation,  using  the 
existence  and  uniqueness  of  the  solution  as  a  de  facto  justification.  Or  we  may  define 
fn(p),  unambiguously,  as  the  maximum  return  when  only  N  stages  are  allowed,  and  set 

fip)  =  Lim  fK(p)  (2.67) 

tf-*  CO 

whenever  this  limit  exists.  Or  we  may  ambitiously  consider  the  space  of  all  sequences  of 
decisions  S  —  A'iA'i  •  •  •  ,  remembering  that  the  exponents,  ait  are,  in  general,  random 
variables  depending  on  the  pattern  of  events  and  not  merely  fixed  in  advance,  and  define 

/(p)  =  Max /*(/>),  (2.68) 

* 

when  it  exists.  In  general  it  will  be  clear  that  SupB  /«(/»)  exists,  and  an  essential  part  of 
the  problem  will  be  to  show  that  the  maximum  is  actually  attained.  That  the  maximum  is 
actually  attained  may  be  demonstrated  by  use  of  the  functional  equations  or  abstract 
topological  techniques,  which  we  shall  not  present  here. 

From  the  mathematical  standpoint  it  would  seem  that  (2.67)  is  a  preferable  definition, 
since  it  furnishes  a  stronger  hold  on  f(p)  than  does  (2.68).  However,  since  the  two 
definitions  lead  to  the  same  function,  it  is  actually  convenience  that  decides  which  to  treat 
as  fundamental  in  any  particular  problem. 

2.7.  Approximate  and  Computational  Methods 

At  this  point  it  must  be  confessed  that  in  the  theory  of  dynamic  programming,  as  in 
most  other  theories  treating  of  the  physical  world,  the  majority  of  the  functional  equations 
that  arise  will  be  resolutely,  if  impartially,  insoluble  by  analytic  means,  as  far  as  explicit 
solutions  are  concerned.  Consequently,  the  theoretical  and  practical  development  of  the 
theory  requires  that  efficient  and  readily  applicable  approximate  and  computational 
methods  be  developed. 

In  theory  there  is  only  one  method  that  may  be  used  to  approximate  the  solution  of 
a  functional  equation,  namely,  the  solution  of  an  approximate  functional  equation.  In 
practice  the  variants  of  this  technique  differ  greatly. 

Let  us  write  our  functional  equation  in  the  form 

f  =  nf,p ),  (2.69) 

where  /  represents  the  unknown  function,  T  is  the  transformation  induced  upon  /  by  the 
physical  process,  and  P  is  a  quantity  representing  various  parameters  that  occur,  constants 
and  functions. 

The  method  of  successive  approximations  in  its  usual  guise  relies  on  solving  the  fol- 
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lowing  approximate  equation 

fn*t  =  T(fn,P),  (2.70) 

where  /„  is  a  suitable  first  guess  at  the  solution.  In  more  refined  applications,  (2.70)  is 
replaced  by 

-  R(fn»)  =  T(Jm  P )  -  R(/„) ,  (2.71) 

where  R(J )  is  a  transformation  so  chosen  as  to  force  /„  to  possess  certain  desired  proper¬ 
ties  or  so  chosen  as  to  increase  the  rapidity  of  convergence. 

In  place  of  the  above  approach,  we  may  consider  the  equation 

f=T(f> p') .  (2-72) 

where  P'  represents  a  modified  set  of  parameters  so  chosen  that  the  solution  of  (2.72) 

may  be  obtained  in  a  simple  way.  Thus,  for  example,  in  the  theory  of  differential  equa¬ 
tions,  the  treatment  of  non-linear  and  linear  equations  with  variable  coefficients  depends 
to  an  enormous  extent  on  the  fortunate  circumstance  that  linear  equations  with  constant 
coefficients  are  explicitly  solvable  in  terms  of  exponentials.  Similarly,  in  the  considera¬ 
tion  of  the  functional  equations  occurring  in  the  theory  of  dynamic  programming,  any 
results  that  may  be  obtained  under  the  simplifying  hypotheses  of  linearity,  convexity,  and 
so  on  are  extremely  important,  insofar  as  they  furnish  guides  to  the  behavior  of  the  actual 
solutions.  The  justification,  from  the  larger  point  of  view,  of  searching  for  complete 
solutions  of  simplified  equations  lies  precisely  in  the  hope  and  expectation  of  using  these 
special  solutions  as  approximations — not  only  quantitatively,  but  also  qualitatively — to  the 
solutions  of  the  more  realistic  and  complicated  equations. 

The  functional  equations  that  we  treat  of  afford  yet  a  third  approach,  which  arises 
from  the  duality  between  function  space  and  strategy  space.  In  place  of  the  original  class 
of  transformations,  we  may  consider  a  subclass  obtained  by  restricting  the  permissible 
choices.  Thus,  for  example,  in  place  of  infinite  processes,  we  may  consider  finite  proc¬ 
esses;  in  place  of  three-choice  processes,  we  may  consider  two-choice  processes. 

Employing  the  optimal  policies  for  the  simplified  model,  we  obtain  approximate  poli¬ 
cies  for  the  larger  model.  A  computation  will  then  yield  an  approximate  solution  to  the 
functional  equation.  It  is  dear  mathematically  and  intuitively  that  if  the  method  of  suc¬ 
cessive  approximations  is  now  employed,  the  convergence  will  be  monotone,  an  important 
fact  from  the  computational  standpoint. 

The  essential  idea  behind  the  preceding  method  is  that  we  obtain  suitable  first  approxi¬ 
mations  most  readily  by  approximating  in  the  strategy  space  rather  than  in  the  function 
space.  It  is  in  this  fashion  that  the  physical  process  generating  the  functional  equation 
can  best  be  exploited,  and  that  experience  and  intuition  gained  by  solving  simpler  prob¬ 
lems  can  be  most  efficiently  utilized. 

2.8.  A  Geometric  Technique 

Let  us  now  describe  an  interesting  approach  particularly  applicable  to  a  certain  class  of 
problems  of  deterministic  type,  in  particular  to  equations  of  the  form 
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=  Max  [At:  Pt(x,  y)  +  ctf( rix,siy)'],  (2.73) 

l<Klf 

O  <  rit  sit  Ci  <  1,  where  P4(x,  y)  is  a  homogeneous  polynomial  in  *  and  y.  Since  the 
solution  will  be  homogeneous  in  x  and  y,  it  is  sufficient  to  consider  only  values  of  x  and  y 
for  which  x  +■  y  =  1 . 

For  any  given  x  and  y  we  may  write 


X  x  +-  y’ 

y 

x  +  y’ 

4(X  y)  -  K*’ 

n’y)  (x  +  y)*' 

(2.74) 

Any  strategy  in  (2.73)  has  the  form 

s  ■- 

=  AaxAa » •  • 

1  2 

•• 

(2.75) 

Employing  this  strategy,  we  obtain 

/«(*>  y ) 

=  Pi  (*,y) 

+  •  •  •, 

(2.76) 

a  homogeneous  function  of  x  and  y.  This  function  /s(x,  y)  may  now  be  regarded  as  a 
function  of  one  variable,  x,  0  <  x  <  1,  /s(x,  y)  =  /s(x).  To  each  strategy,  5,  in  conse¬ 
quence,  corresponds  a  continuous  curve  /s(x). 

If  from  all  these  curves  we  now  form  the  envelope,  above,  we  obtain  a  new  curve, 

E(x)  =  Env/S(x),  (2.77) 

a 

which  must  necessarily  be  /(x,  y)  =  /(x). 

Although  in  general  the  envelope  will  be  difficult  to  obtain  explicitly,  various  qualita¬ 
tive  features  of  the  solution  may  often  be  obtained  readily.  An  example  of  the  application 
of  this  technique  will  be  given  in  Section  3.12. 

In  the  important  case  in  which  the  Pj(x,  y)  are  linear  functions  of  x  and  y,  the  enve¬ 
lope  curve  is  convex,  a  result  of  great  utility  in  the  application  of  the  method. 


CHAPTER  3 


THE  GOLD-MINING  EQUATION 


3.1.  Introduction 

In  this  chapter  we  turn  to  a  more  detailed  study  of  the  "gold-mining"  equation,  begin¬ 
ning  with  the  simplest  representative, 

/(x,  y)  =  Max  \Al  P'l*  +  K°'  >>]  +  +  1 ,  (3.1) 

where  x,  y  >  0,  and  the  constants  that  appear  are  subject  to  the  following  conditions: 

(a)  0  ^  pit  p2t  ‘Jit  *C  i|  pl  “1“  P2  1  •  *fl  “1“  ^2  1  > 


o  <  cu  dt  <  1 , 


c,  +  c2  =  1 ,  dx  +  =  1 .  (3.2) 


The  origin  of  this  equation  was  discussed  in  Problem  1.5,  page  2,  and  the  required 
existence  and  uniqueness  theorems  were  given  in  Chapter  2. 

We  shall  begin  by  presenting  a  solution  to  (3.1)  and  also  some  generalizations.  We 
shall  then  consider  the  equation 


/(x,  y,  a)  =  Max 


-A:  /'./(O,  y,  a  4-  x)  4-  p2f(c2x,  y,  a  +  r,x)  4-  p34>(<i) 


?a/(x,  0,  a  -f-  y)  4-  q2f(x,  d2y,  a  4-  d,y)  4-  q. 


.*(“)  "I 

:<4>  (.<*)] 


,  (33) 


x,y,a>  0,  with  /(0,  0,  <j)  =  <£(«),  which  arises  when  we  use  as  a  criterion  function 
<P(z)  in  place  of  z,  where  z  is  the  total  yield. 

This  equation  may  be  solved  explicitly  in  the  case  in  which  <£(z)  =  z,  as  above,  and 
in  the  case  in  which  <f>(z)  =  eb~.  The  asymptotic  form  of  the  solution  for  large  x  and  y 
will  be  given  in  this  latter  case. 

After  this  we  shall  discuss  briefly  some  extensions  of  (3.1)  that  are  at  present  obdu¬ 
rately  resisting  analytic  solution. 

Turning  from  this  analytic  treatment,  we  shall  then  present  an  interesting  geometric 
treatment  of  (3.1),  using  the  ideas  of  Section  2.8. 


n 


3.2.  Tho  Solution  of  Equation  (3.1) 

The  purpose  of  this  section  is  to  provide  an  introduction  to  the  analytic  techniques 
we  shall  employ  throughout  the  chapter  and  to  demonstrate 
Theorem  3.1.  Consider  the  functional  equation 
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/(*,  y)  ~  Max 


i:  + /(***•*)] 1 

y 

*:  ^  +  /(*•  ^*7>] 


where 


(a) 

Pk  >  o. 

fa  >  0. 

y  y 

gp*.  i. 

(b) 

1  >  Ot, 

dk  >  0, 

<*  +  Ck  =  d'h  +  dk  =  1 , 

(c) 

x,  y  >  0 . 

(3.5) 

The  optimal  choice  of  operations  is  the  following:  If 

.v  jir 

2  Pkck  2  <}k4k 

-■— »  *  >  - 7,  (3  6) 

1  —  2  /'*  1  ~  2  ‘ik 

k- i  fc=i 

choose  A;  if  the  reverse  inequality  holds,  choose  B.  In  case  of  equality,  either  choice  is 
satisfactory. 

To  simplify  the  notation  and  the  Jgebra,  let  us  consider  first  the  simpler  form  of 
(3.4)  given  by  (3.1).  As  noted  above,  we  already  know  from  Chapter  2  that  there  is  a 
unique  solution  to  this  equation.  Let  us  turn,  then,  to  a  discussion  of  some  of  the  simpler 

properties  of  /(*-,  y).  Since  pi  +  pi  <  1,  qx  4-  q2  <  1,  it  follows  that  /( 0,0)  =  O. 

From  the  fact  that  f(kx,  ky)  and  kf(x,  y)  satisfy  the  same  equation  for  k  >  0,  it  follows 
that  f(kx,  ky)  =  kf(x,  y),  for  k  >  0.  Setting  y  =  0  and  using  /(c^r,  0)  =  c2/(x,  0), 
we  obtain 

u  c\\ _ vr  \A:  (p' +  f>2c^x  +  p*c*f(*>°y I 

/(*,0)  Max|^g;  (yi  +  ?f)/(x>0)  J 

=  (/b  +  /Vi)*  +  piC2fix,  0),  (3.7) 


whence 


and,  similarly, 


f(x  o)  = 

M  '  (!  -  ptc2)  ' 


(».») 

These  results  are,  of  course,  obvious  if  we  consider  the  process  generating  the  func¬ 
tion.  On  these  grounds  we  should  also  suspect  that  A  would  be  employed  whenever  y 
was  sufficiently  small  compared  with  x.  This  fact  follows  from  the  continuity  of 
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fix,  y)  (compare  Section  2.2),  since  the  inequality 

fix,  y )  >  (?i  +  ?!<*i)y  +  q jf(x,  0)  +  q2f(x,  d2y )  (3.10) 

must  hold  for  small  positive  y  =  y(x),  for  x  >  0,  seeing  that  it  is  valid  for  y  =  0. 

It  follows  then  that  there  are  two  regions,  close  to  the  x  and  y  axes,  in  which  the 
optimal  choices  are,  respectively,  A  and  B,  whenever  (x,  y)  is  contained  in  either  of  these 
regions,  as  shown  in  Fig.  3.1. 

It  is  reasonable  to  suppose  that  the  solution  has  the  form  shown  in  Fig.  3.2.  The 
meaning  of  Fig.  3.2  is  that  A  is  employed  whenever  (x,  y)  is  in  RA,  the  region  between 
the  x-axis  and  L,  and  B  is  employed  in  the  complementary  region.  On  the  line  L  either  A 
or  B  may  be  used. 


c 


That  the  boundary  curve,  if  it  exists,  must  be  a  straight  line  follows  from  the  homo¬ 
geneity  of  fix,  y).  Assuming  that  the  solution  has  this  form,  we  shall  show  that  the  equa¬ 
tion  of  L  may  be  calculated  from  the  fact  that  it  is  an  indifference  curve.  By  this  term  we 
mean  that  for  points  (x,  y)  on  the  curve,  the  value  of  the  function  /(x,  y)  is  the  same 
whether  we  employ  A  or  B. 

Observe  that  the  effect  of  employing  A  is  always  to  drive  P  into  RB,  whereas  the  use 
of  B  sends  P  into  RA.  Consequently,  if  A  is  used  at  P,  the  next  choice,  in  an  optimal 
policy,  must  be  B,  and  vice  versa  if  B  is  used. 

This  alone  would  not  be  sufficient  to  determine  L,  were  it  not  for  another  fact.  Since 
the  operations  A  and  B  operate  on  x  and  y  alone,  there  will  be  a  certain  symmetry  in  the 
results  obtained  by  using  A  and  then  B,  or  B  and  then  A,  which  plays  a  decisive  role  in 
the  solution. 

Let  us  now  do  a  small  amount  of  computing.  Using  the  values  of  /(x,  0)  and  /(0,  y) 
obtained  above,  we  have 


fix,  y)  =  Max 


A: 

B: 


ip  i  +  p2^i)x  + 

iqi  +  q^i)y  + 


PM  i  +  q-jd^y 

1  —  q„d., 
qxip-i  +  p2c  t)x 

1  p  iC  2 


+  pific-.x,  y) 
+  q-zfix,  d.,y ) 


(3-11) 
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To  simplify  the  notation,  let  us  denote  the  coefficients  of  x  and  y  in  the  above  equation 
by  ax,  a,  in  A  and  by  pu  fi2  in  B.  If  we  employ  A,  we  obtain,  using  an  obvious  notation, 


/a(*>  y )  =  +  aty  +  p2f(c2x,  y). 


(3.12) 


Following  this  by  B,  we  have 


f*B (*,  y )  =  («i  +  0xp2c2)x  +  (a,  +  p2p2)y  -I-  p2q2f(c2x,  d2y )  .  (3.13) 

Similarly,  the  result  of  B  and  then  A  is 

PbaCx,  y)  =  (£i  4-  q2 <xx)x  +  (02  +  q2a2d2)y  4-  p2q2f(c2x,  d2y)  .  (3.14) 

If  ( x ,  y)  lies  upon  L,  we  must  have  fAB  =  jBA.  Equating  the  two  expressions,  we  ob¬ 
serve  that  the  unknown  function  f(c2x,  d2y)  disappears.  Consequently,  we  obtain  for  L 
the  equation 

[«i(l  -  q 2)  4-  p,(p2c2  -  1)]*  =  \_<x2(q2d2  -  1)  4-  j32(1  -  p2)]y.  (3.15) 

Using  the  precise  values  of  ax,  /?,,  ar2,  /?2  as  given  by  (3-11),  we  finally  obtain,  as  the 
equation  of  L, 


(Pi  4-  p2cx)x  _  (gx  4-  q2dx)y 
i  pi  p2  i  q  i  q-i 


(3.16) 


This  is  a  remarkably  simple  equation,  since,  as  we  observe,  the  coefficient  of  x  de¬ 
pends  only  on  the  A  operation,  while  the  coefficient  of  y  depends  only  on  the  B  opera¬ 
tion.  Furthermore,  each  coefficient  admits  of  a  very  simple  interpretation  as  the  ratio  of 
the  expected  yield  of  the  operation  to  the  probability  of  termination  of  the  process. 

Let  us  insert  a  word  of  warning:  Although  this  elegant  result  holds  for  some  generali¬ 
zations  of  the  functional  equation,  it  does  not  hold  in  general,  as  we  shall  subse¬ 
quently  see. 

Let  us  now  prove  that  the  solution  actually  has  this  simple  form.  To  make  the  previous 
argument  rigorous,  we  observe  that  below  L,  the  procedure  consisting  of  A,  B,  and  an 
optimal  continuation  is  superior  to  B,  A,  and  an  optimal  continuation,  and  that  the  re¬ 
verse  is  true  above  L.  Referring  to  Fig.  3.1,  let  ^  h>c  a  point  above  the  known  /4-region 
and  far  enough  below  L  so  that  any  outcome  of  a  B-choice  transforms  Q(x,  y)  into  the 
known  /4-region. 

To  show  that  A  is  used  at  Q,  we  argue  by  contradiction.  Suppose  that  B  were  used; 
then  the  next  choice  would  necessarily  be  A.  However,  we  have  seen,  above,  that  below 
L,  the  procedure  consisting  of  B,  A,  and  an  optimal  continuation  is  inferior  to  A ,  B,  and 
an  optimal  continuation.  Hence,  A  is  used  at  Q.  It  is  clear  that  we  may  continue  this 
argument  until  we  have  demonstrated  that  the  region  between  L  and  the  x-axis  is  an 
/4-region.  Similarly,  starting  from  the  known  fl-region,  we  may  demonstrate  that  the 
region  above  L  is  a  B- region. 

We  have  carried  through  the  proof  for  the  simplest  case  of  (3.4).  There  is  no  diffi¬ 
culty  in  verifying  that  the  argument  is  general. 

Geometrically,  the  pattern  is  as  follows.  When  (x,  y)  is  in  RA,  A  is  employed  until 


"1 
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the  resultant  point  is  in  RB,  at  which  time  B  is  employed  until  the  point  is  again  in  RA, 
and  so  on. 

3.3.  A  Generalization 

There  is  no  difficulty  in  extending  the  above  analysis  to  the  following  ^-dimensional 
equation 


/(*„  *21  •  •  ■ ,  xv )  = Max  >*** 

+  /(*!.  *2 


(3.17) 


where 


00 

Pa  >  0. 

(b) 

1  >  fik  > 

(0 

x,  >  0. 

A 

5- 


i<  1.  /=  1.2. 


», 


(3.18) 


The  decision  functions  are  again  the  ratios  of  expected  gain  to  probability  of  termi¬ 
nation,  namely. 


2  Pik<~  ik 

D<(*.)  =  ,*  v  r— *i- 

1  —  2.  P  ik 

k 


(3.19) 


If  Max  D,(Xi)  is  attained  for  i  —  L,  then  the  Lth  choice  is  made  unless  there  is 
equality,  in  which  case  any  one  of  the  maximizing  choices  is  optimal. 


3.4.  The  Form  of  fix,  y) 


Having  obtained  a  very  simple  characterization  of  the  optimal  policy,  let  us  now  turn 
our  attention  to  the  function  /(x,  y) .  In  general,  no  simple  analytic  representation  will 
exist.  If,  however,  we  consider  Eq.  (3.1),  which  we  write  again  as 


/(*,  y )  =  Max 


+  a2y  -1-  p2f(c‘x,  y)  ~ 

.Pi*  +  Pz y  +  d2y)\’ 


(3.20) 


we  shall  show  that  if  c.,  and  d2  are  connected  by  a  relation  of  the  type  c*  =  d",  m  and 
n  being  positive  integers,  we  shall  obtain  piecewise  linear  representations  for  f(x,  y). 

It  is  sufficient,  in  order  to  illustrate  the  technique,  to  consider  the  simplest  case, 
c2  —  d2. 

Let  (x,  y)  be  a  point  in  the  /4-region.  If  A  is  applied,  either  (x,  y)  goes  into  (0,  y), 
in  which  case  B  is  used  continually  thereafter,  or  it  is  transformed  into  (CjX,  y),  which 
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may  be  in  either  an  A-  or  a  B-region.  Let  Lx  be  the  line  that  is  transformed  into  L  when 
(x,  j)  Bocs  *nto  (f«*»  y),  let  L,  be  the  line  transformed  into  L„  and  so  on.  Similarly,  let 
Mt  be  the  line  transformed  into  L  when  (x,  y)  goes  into  (x,  d2y),  and  so  on.  In  the 
sector  LOLu  A  is  used  first,  followed  by  B,  as  shown  in  Fig.  3.3. 


Hence,  for  (x,  y)  in  this  sector  we  obtain 

/(*>  y )  =  +  c-  y  +  y) 

=  atx  +  a2y  4-  p2(Pxc2x  4-  j32y)  +  p2q2f(c2x,  c2y ) 

=  («i  +  PiPxC^x  +  (a2  +  p2p2)y  +  p2q2c2f(x,  y)  . 

This  yields 

ff  \ _ (gi  d~  p2fi\C2)x  4-  (n2  4-  p2p2)y 

n  ’ y)  ~  1  - 

for  (x,  y)  in  LOLt.  Similarly,  we  obtain  a  linear  expression  for  /  in  LOM1.  Having  ob¬ 
tained  the  representations  in  these  sectors,  it  is  dear  that  we  obtain  linear  expressions  in 
LfiLz,  etc. 

3.5.  The  Problem  for  a  Finite  Number  of  Stages 

Let  us  now  consider  the  problem  that  arises  when  only  a  finite  number  of  stages  are 
allowed.  If  we  set 

/„(x,  y)  =  expected  return  using  an  optimal  N- stage  policy,  (3.23) 

then 


(3.21) 

(3-22) 


/i(x,  y)  —  Max  [(p,  4-  p2c i)x,  (f,  4-  qtd1)y'], 

(  \  _  -w  \A:  M*  +  M°*  >)]  +  P*lc i*  +  hiCjX,  y)]  1 

y>  IB:  ?l[y  +  fN{x,  0)]  4-  q2\dxy  +  /„(x,  rfty)]  J  ’ 


(3.24) 
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We  know  from  the  results  concerning  existence  and  uniqueness  in  Section  2.2  that, 
as  N  — ♦  oo,  fy(x,  y)  — »  f(x,  y).  However,  it  is  not  reasonable  to  suspect  that  for  each  N 
the  optimal  policy  will  be  that  of  /(x,  y).  Furthermore,  it  is  clear  that,  in  general,  the 
policies  will  not  be  the  same  for  N  —  1. 

It  does,  however,  follow  from  our  previous  argumentation  that  if  for  some  N  the 
decision  regions  of  fN(x,  y)  and  /(x,  y)  coincide,  they  must  do  so  for  all  larger  N. 

To  show  that  the  regions  need  not  coincide  for  N  =  2,  consider  the  following  sim¬ 
ple  example 


fx+i(x,  y)  =  Max 


'ax  +  pfx(_cx,  y)  ' 

-Py  +  qfx(x,  dy)_ 


N  =  1,2, 


(3.25) 


where  a,  p  >  0,  0  <  c,  d  <  1,  0  <  p,  q  <  1.  For  N  =  1,  we  have  f1  =  Max  [ax,  /3y~\. 
We  may  take  a  =  p,  since  this  is  equivalent  to  changing  the  x  or  y  scale.  The  boundary 
line  for  N  =  1  is  then  x  =  y,  which  we  call  Lt.  For  N  =  2,  we  consider  the  possible 
strategies  AA,  AB,  BA,  BB.  We  then  have  the  following  boundary  curves: 


A  —  B, 

x  =  y, 

AA  =  BA, 

y  =  (\  +  pc  - 

BB  =  BA, 

X 

y  =  7' 

AB  =  AA, 

y  =  rx, 

AB  =  BA, 

(1  -  7) 

y  =■  - - * 

y  o-/>) 

,  the  lines  will  have  the  relative  p 

2,  the  decision 

regions  will  be  as  ! 

(3.26) 


n 


BB  BA 


AB  -  AA 


AA  =  BA 


LA^B  =  AA) 


AB  =  BA 


»  L^AB  =  BA) 


Fig.  3.4 


Fig.  3.5 


Let  us  now  show  that  decision  regions  for  fN  converge  toward  that  of  /  as  N  — >  oo, 
and  that  there  will  always  be  an  N0  with  the  property  that  for  N  >  N„  the  regions  will 
coincide. 
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The  proof  is  very  simple.  Consider  the  situation  for  N  =  3,  as  in  Fig.  3.6. 


Let  Ljfyl-1)  denote  the  line  that  is  transformed  into  L2  when  (x,  y)  goes  into  (ex,  y). 
Let  Q  be  in  the  sector  between  L2  and  L2(A  l).  If  A  is  used  at  Q,  then  B  is  used  next, 
since  the  transformed  point  is  in  the  RB- region  for  N  =  2.  If  Q  is  above  Z.„,  we  know 
that  AB  is  inferior  to  BA,  regardless  of  N,  as  a  set  of  first  two  choices.  Hence,  B  is  used 
at  Q.  This  shows  us  that  the  B-region  for  the  N- stage  process  is  at  least  that  containing 
the  sector  bounded  by  the  y-axis  and  L2(A1).  This  process  continues  until  L^A-1), 
for  some  k,  lies  below  Lx,  which  must  necessarily  occur  after  some  finite  number 
of  stages. 

The  argument  is  general  and  applies  to  the  general  equations  discussed  above.  How¬ 
ever,  we  cannot  assert  that  the  convergence  is  monotone,  as  we  suspect,  until  we  know 
more  about  the  A-  and  B-regions  for  the  N- stage  process.  It  is  probably  true  that  there 
are  two  regions  for  each  N,  but  this  is  a  result  that  has  only  been  demonstrated  in  the 
case  of  the  simple  equation  (3-20). 

To  show  this  result,  we  use  the  fact  that  this  equation  arises  from  a  model  in  which 
the  results  of  an  operation  are  known  only  as  far  as  the  expected  outcome  is  concerned. 
Any  N- stage  policy  has  the  form,  therefore, 

SN  =  AaiBbj  •  •  ■  Aa*Bbt,  (3.27) 

where  the  a,  and  are  0  or  positive  integers.  There  are  now  two  cases:  Ss  is  either 
equal  to  Ax  or  BN,  or  it  has  the  form  AkB  •  •  •  or  BlA  •  •  • ,  where  k,  l  <  N. 

Referring  to  Fig.  3.6,  consider  a  point  Q  above  Lm.  If  an  optimal  policy  has  the  form 
AkB  •  •  •  ,  k  <  N,  which  may  be  written  Ak~1(AB)  •  ■  ■ ,  it  may  be  improved  by  replac¬ 
ing  AB  with  BA,  since  A  iterated  any  number  of  times  maintains  Q  above  La.  It  follows 
then  that  in  the  region  above  Lx,  either  B  is  used  first  or  A  is  used  repeatedly;  and, 
similarly,  in  the  region  below  La„  either  A  is  used  first  or  B  is  used  repeatedly. 

Since  Ay  is  clearly  the  optimal  policy  for  points  sufficiently  close  to  the  x-axis,  and 
Bv  is  the  optimal  policy  for  points  sufficiently  near  the  y-axis,  it  follows  from  the 
analytic  form  of  the  yield  for  any  Ss — an  expression  which  is  linear  in  x  and  y — that  if 
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Ay  is  used  at  Q,  it  is  used  for  all  points  below  the  line  OQ,  and  similarly  for  BN,  "below” 
being  replaced  by  "above.” 

It  follows  that  there  are  always  two  regions,  separated  either  by  AB  =  BA  or  by  a 
line  of  more  complicated  form,  if  Ay  or  BN  are  still  dominant.  For  large  N  it  is  clear 
that  AN  and  BN  become  less  and  less  influential,  so  that  eventually  AB  =  BA  emerges 
as  the  sole  dividing  line. 


3.6.  A  General  Utility  Function 

We  have  in  the  previous  sections  considered  only  the  case  in  which  the  utility  of  a 
total  yield  z  was  proportional  to  z.  Let  us  now  turn  to  the  more  interesting  case  in  which 
the  utility  is  measured  by  a  function  <j>(z). 

The  non-linearity  of  cj>(z)  will,  in  general,  require  the  introduction  of  a  new  state 
parameter — the  quantity  obtained  as  a  result  of  the  preceding  operations.  Denoting  this 
quantity  by  a,  we  obtain  the  equation 


/O,  y,  a)  -  Max 


A:  pj/( 0,  y,  a  +  x)  +  p2f(c2x,  y,  a  +  crx)  +  p3<t>(a)~ 
B:  q3f(x,  0,  a  +  y)  +  q2f(x,  d2y,  a  +  d,y)  +  q3<t>{a) J’ 


/(0,  0,  a)  = 


(3.28) 


as  noted  in  Section  3.1. 

This  equation  is  more  difficult  to  treat  of  than  that  occurring  for  4>(z)  =  z,  and  we 
shall  only  be  able  to  present  its  solution  for  certain  classes  of  functions. 

We  have 


/(°.  7.  a)  =  Max 


A-  Pif(°’  y< a)  +  p2f(°>  y- a )  +  Ps<t>  (*) 

B:  qj( 0,  0,  a  +  y)  +  ^2/( 0,  d2y,  a  +  d3y)  +  q3<t>(a) 


]• 


(3.29) 


Since  f(x,  y,  a )  >  /( 0,  0,  a)  —  <p(a)  for  x,  y  >  a,  with  strict  inequality  if  x  or  y  is 
positive  it  follows,  since  pi  +  p2  +  p3  <  1,  that 


/(°.  y>  a)  =  q !<(>(,*  +  y)  +  ^3</>(<a)  +  q2f(0,  d2y ,  a  +  J,y)  ,  (3.30) 

and,  similarly,  that 


/(*,  0,  a)  =  pi<j>(a  +  x)  4-  p3<f>(a)  +  p2f(c2x,  0,  a  +  c,x)  .  (3-31) 


For  given  <f>,  these  equations  may  now  be  solved  by  iteration  for  the  functions 
f(0,y,a )  and  /(*,  0,  a). 

Let  us  again  proceed  formally  before  turning  to  a  justification  of  our  operations.  It 
is  clear  from  the  conservative  nature  of  the  processes  involved  that  the  quantity 
x  +  y  +  a  remains  constant  throughout  the  sequence  of  operations.  Consequently,  the 
effect  of  any  choice  is  to  transform  a  point  in  the  region  R :  x  +  y  +  a  —  c,  x,  y,  a  >  0 
into  another  point  in  the  region,  as  shown  in  Fig.  3.7  on  page  40. 

The  problem  that  confronts  us  is  that  of  determining  the  set  of  points  in  R  in  which 
A  is  used  and  the  set  in  which  B  is  used.  If  we  assume,  as  before,  that  these  sets  con- 
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stitute  connected  regions  having  a  boundary  curve  P,  we  may  proceed  to  find  the 
boundary  as  before,  using  the  fact  that  the  boundary  is  an  indifference  curve. 


However,  we  must  assume  more  about  the  boundary  curve  than  previously,  where  the 
fact  that  it  was  a  straight  line  resulted  in  considerable  simplification.  Let  us  assume  that 
the  result  of  applying  A  to  a  point  P  on  the  boundary  curve  is  to  transform  it  into  the 
B-region,  and  vice  versa. 

Having  provided  ourselves  with  a  cushion  of  assumptions,  let  us  now  go  through  the 
calculations.  If  A  is  employed,  we  obtain 

/(*>  y,  <0  =  /'i/(0,  y,a  +  x)  +  p2f{c2x,  y,  a  -+-  r,x)  +  p3*00  •  (3.32) 

Employing  B  at  (0,  y,  a  +  x)  and  (c2x,  y,  a  +  cxx),  we  obtain 

f{x,  y,  a )  =  />,[?, *0*  +  x  +  y)  +  q2f( 0,  d2y,  a  +  x  +  dxy) 

+  q3<Ka  +  *)]  +  °> a  +  tv*  +  y ) 

+  ?2/(<Vr,  d2y,  a  +  cxx  +  dxy) 

+  qMa  +  r,*)]  +  p3<p{a)  •  (3.33) 

A  similar  expression  is  obtained  by  using  B  and  then  A.  Equating  the  two,  we  obtain,  for 
the  equation  of  the  boundary  curve, 

Ms'K*  +  *)  +  +  cix )  +  p3<Ka) 

=  +  y)  +  frPsM*  +  dx  y)  +  q3<Pia),  (3.34) 

which  may  be  written 

M»[*0*  +  x)  ~  *00]  +  p2q3[<Pia  +  ^i*)  -  *(<0] 

=  ?iM*0*  +  y)  -  *00]  +  y«fr.[*0*  +  dxy)  -  *(<0] .  (3.35) 

In  order  to  establish  the  result  rigorously,  we  must  ascertain  whether  or  not  the 
boundary  curve  has  the  desired  transformation  property. 

What  we  actually  require  is 

Property  T.  If 

F{x,y,d)  =  pif3[*0*  +  x)  —  *(*)] 

+  M»[*0*  +  c*x)  -  *00]  -  qipzl'H*  +  y)  -  *00] 

—  fiM*0*  +  dxy)  -  *(<0]  >  0,  (3.36) 
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i: 


then  F(CzX,  y,  a  4-  c2x)  >  0.  If  F(x,  y,  a )  <  0,  then  F(x,  d2y,  a  +  d2y)  <  0. 

Unfortunately  it  seems  to  be  difficult  to  present  any  simple  criterion  which  will 
ensure  that  a  general  utility  function  <j>(z)  will  satisfy  Property  T.  It  is  not  difficult  to 
show,  for  example,  that  =  z 2  does  not  satisfy  it  for  all  values  of  />*  and  qt. 

Let  us  now  demonstrate 
Theorem  3.2.  If 


(a) 

4>(z)  is  strictly  increasing  and  continuous, 

*(* )  >  o, 

(b) 

Property  T  is  satisfied. 

(337) 

then  the  solution  to  (3.28)  is  given  by 

f(x,  y,  a )  =  p,f( 0,  y,a  +  x)  +  p2f(CtX,  y,  a  +  r,x)  +  ps<K<*)  (3.38) 

for  F(x,  y,  a )  >  0,  and  by 

/(*.  y,  ~  <h /(*.  0,  a  +  y)  +  q2f(x,  d2y,  a  +  d2y)  +  q»<Ka),  (3.39) 

for  F(x,  y,  a )  <  0. 

The  optimal  policy  is  to  apply  A  when  F(x,  y,  a)  >  0  and  B  if  F(x,  y,  d)  <  O. 
When  there  is  equality ,  it  is  a  matter  of  indifference  as  to  which  choice  is  made. 

Proof.  The  proof  is  carried  through  in  two  stages.  First  we  show  that  there  is  a 
region  in  the  plane  x  +  y  +  a  =  c  where  A  is  always  used,  namely,  a  region  dose  to 
y  =  0.  Then  we  consider  what  happens  at  a  point  Q  in  the  region  defined  by 
F(at,  y,  d)  >0  and  x  +  y  +  a  —  c. 

Let  us  assume  for  the  moment  that  we  have  already  established  the  existence  of  a 
region  where  A  is  always  used.  If  B  is  used  at  Q,  it  follows  from  Property  T  that  the 
transformed  point  is  again  in  the  same  region.  It  cannot  be  true  that  B  is  used  repeatedly, 
if  x  >  0,  since  eventually  the  y  coordinate  will  be  so  small  that  the  point  will  be  in  the 
/1-region.  Hence,  if  at  Q  an  optimal  policy  employs  B  for  the  first  k  choices,  the  sequence 
of  moves  has  the  form 

S  =  BB  •  •  ■  (k  times)  —  BA.  (3.40) 

On  the  basis  of  Property  T,  we  are  still  in  the  region  F(x,  y,  a)  >  O,  x  +  y  +  a  =  c 
after  employing  B  (k  —  1)  times.  The  next  two  moves,  B  and  then  A,  cannot  be 
optimal,  however,  since  the  region  is  defined  by  the  property  that  AB  plus  optimal  con¬ 
tinuation  is  superior  to  BA  plus  optimal  continuation.  This  shows  that  at  Q,  move  B 
cannot  be  used  first  in  an  optimal  policy. 

It  remains  then  to  establish  the  existence  of  the  /1-region  mentioned  above.  Since 
/(x,  y,  a )  >  <f>(a)  for  x,  y  >  0  and  one  at  least  positive,  it  follows  that 

Pif(°>  y,  a  +  x)  +  pifictx,  y,a  +  ctx)  +  p2p(a) 

>  ?i/(4  0,  a  +  y)  +  q2f(x,  d2y,  a  +  d2y )  +  q„4>(a) ,  (3.41) 

which  holds  at  y  =  0,  must  by  virtue  of  the  continuity  of  the  functions  involved,  for 
any  x  >  0,  hold  for  some  interval  0  <  y  <  y(x,  a). 
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3.7.  The  Exponential  Utility  Function 

One  way  of  obtaining  utility  functions  that  have  the  desired  property,  T,  is  to  make 
the  boundary  equation  independent  of  a.  If  we  wish  this  to  be  true  for  ail  values  of 
the  parameters  pi  and  qit  we  must  have 

4,(a  +  x)  -  f,(x )  =  G(x)H(a) ,  (3.42) 

which  yields,  using  standard  arguments,  under  the  assumption  of  continuity,* 


(a)  <£(z)  =  mz  +  »,  or 

(b)  <£(z)  =  vebz .  (3.43) 


We  have  already  considered  the  first  utility  function;  let  us  now  consider  the  second. 

The  important  property  of  these  utility  functions  is  that  a  policy  which  maximizes 
the  expected  value  of  </>(z)  proceeds  at  each  stage  without  regard  for  the  amount  already 
obtained,  being  dependent  only  on  the  remaining  amount  to  be  obtained. 

If  we  set,  for  b  >  0 , 

g(x,  y )  =  Max  Exp  (e6*)  (3.44) 

p 

("Exp”  denotes  here  "expected  value,”  not  "exponential”), 
we  obtain  for  g  the  functional  equation 


g(x,  y)  =  Max 


A:  ptebxg( 0,  y)  +  p2ebci*g(c2x,  y )  +  p3~ 
B:  qiehyg{x,  0)  +  q2ebd,«g(x,  d2y)  +  q3_  ' 


As  a  special  case  of  Theorem  3.2,  we  obtain 
Theorem  3.3.  The  solution  of  (3.45)  is  as  follows-.  For 


(3-45) 


p^e"1  -  1)  +  />2(g*c»« 


1)  >  qj(ebv  -  1)  +  q3(ebd,v 


1) 


(3.46) 


use  A;  if  the  reverse  inequality  holds,  employ  B;  if  equal ,  either  is  applicable. 

Observe  that,  as  should  be  true,  the  limiting  solution  as  &  — »  0  is  exactly  that  ob¬ 
tained  from  4>(z)  =z. 


3.8.  Asymptotic  Behavior  of  g(x,  y) 

We  now  turn  to  the  problem  of  determining  the  asymptotic  behavior  of  g{x,  y)  as 
x  and  y  00  ■  We  begin  by  deriving  the  asymptotic  behavior  of  g(x,  0)  and  g(0,  y). 
From  the  equation  we  obtain,  for  large  x, 

g(x,  0)  =  p ieb*  +  p3  +  p.iebrizg(c2x,  0)  .  (3.47) 

This  equation  may  be  solved  by  iteration: 

g(x,  0)  =  ( p +  p,)  +  p^ACp,  +  p,e»V)  +  •  •  •  .  (3.48) 


This  requirement  of  continuity  can  be  considerably  weakened. 
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To  obtain  the  asymptotic  behavior,  however,  we  must  proceed  differently.  Set 


g(.x,  0)  =  •— -T-  +  b(x)ebI, 

(3.49) 

where  h  satisfies  the  equation 

h(x)  =  p:Ke-hx  +  p2t>(c2x). 

(3.50) 

as  we  see  by  direct  substitution.  Although  iteration  yields 

b(x)  —  p:le-b*  p,p:ie-hr,z  +  ■■■  , 

(3.51) 

the  asymptotic  behavior  of  b(x)  is  still  not  apparent.  We  shall  show  that  b{x) 
—  x-a,Jp(x)[l  +  o(l)]  as  x— >  oo,  where  ^(x)  —  'f'(c.jx),  a  =  (log  1 //>.,) /(log  1  /c2). 
To  accomplish  this,  set  h(x)  =  k(x)x~a.  Then  k  satisfies  the  simpler  equation 


k(x )  —  k(c.,x)  =  p:ixae-bJ‘  =  <£(*)  (3  52) 

The  essential  fact  about  <p  that  we  shall  use  is  that  2"  ,  4>(x/c"  )  converges  for  each  x. 
From  (3.52)  we  have 

which  yields 

L™  =  b(x)  +  ^  «#>(?«)  =  *(+>  '  (3  54) 

From  the  form  of  the  limit  function  or  from  the  equation  for  £(x),  we  see  that 
'J'(x)  =  'l'(c.jx)  for  all  x.  If  then  we  write  y  =  x/c "  for  1  <  x  <1  /c  ,  we  have 

Ky)  =  =  ,I,(x)Cl  +  o(1)]  =  *GQri  +  o(1)] 

=  *0)fl+  0(1)],  (3.55) 


as  y  — >  oo . 

Collecting  the  previous  results,  we  see  that  the  asymptotic  behavior  of  g(x.  0)  is 
given  by 

g(*.  +  +o(l)],  (3.56) 

where 


00 

'I'(x)  =  ’F(c2x), 

,0sf 

(b) 

1 f- 

(3-57) 

log  — 

*2 

l 
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The  corresponding  result  for  g( 0,  y)  is 

i(0,y)  =  +  +  o(l)],  (3-58) 

where 


(a) 

COO  =  £(4i  y). 

Ios  ~ 

(b) 

b'  ~  - ~y-- 

(3.59) 

l0gX 

Turning  to  the  equation  for  g(x,  y)  we  have,  for  x  and  y  large, 


e»w  +  p2eic'zg(.c2x,  y)  +  0  (~\ 
=  Max  92  ‘  | 

i^p7 eb<rtv>  +  ^eMiVg(x>  d*y)  +  0 

Setting  £(x,  y)eb{‘* »>  =  g(x,  y),  we  obtain 


A(x,  y)  =  Max 


1  -  ?2 
Li  -  />•, 


V-  +  P*h(c*x’  y)  + 


4-  ?2A(x,  ^y)  + 


•(£)' 


(3.60) 


(3.61) 


To  simplify  still  further,  we  set  h(x,  y)  =  a  +  k(x,  y),  obtaining 


a  +  k(x,  y)  =  Max 


+  f’2<x  +  P*k(czx’  y)  +  0  (jsr) 

+  +  q^(x,  d2y)  +  0  (~-^j 


l_i  -  p2 

If  a  is  chosen  to  be  the  common  solution  of 

“  =  +  p*a  =  +  y*“- 

namely,  p1f,/(  1  —  />2)(1  —  q2),  (3.62)  simplifies  to 


£(x,  y)  =  Max 


p2k(c2x,  y)  + 


q2k(x,  d2y)  + 


•(£)' 

•(£)] 


(3.62) 


(3.63) 


(3.64) 


To  estimate  k(x,  y),  we  use  the  fact  that  the  solution  may  be  obtained  by  means  of 
successive  approximations: 
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*m+i(x,  j)  =  Max 


ptkn(c,x,  y)  +  o 
h*n(x,  dty)  +  0 


k»(x<  y)  =  xr  +  yr>  (3.65) 


considering,  for  our  purposes,  only  values  of  x  and  y  greater  than  1.  The  exponent  r 
will  be  chosen  in  a  moment. 

If  we  have  an  inequality  of  the  type  kn(x,  y)  <  »«/(xr  +  yT),  un  being  a  constant, 
which  inequality  is  certainly  valid  for  n  =  0,  we  obtain 


^»+l  (*,  y)  <  Max 


,  - 

) 

+  °l 

( 

— — —  ) 

(  *°«  /. 

(3.66) 


Choose  r  so  that  px/c\  <  Vi,  qjd\  <  Vi  Since  at,  b1  >  r,  we  see,  since  xTerhz  <  dT  for 
all  x,  that  <  dr/xryr  <  dr/(xr  +  yr),  for  x,y>  1.  Hence,  we  have 


T  * 

+  ** 

( 

^»+i(*.  y)  <  Max 

xr  +  y' 

1 

2*» 

x'  -+-  yr 

i  ** 

_xr  +  f 

xr  +  y 

(3-67) 


for  some  constant  ax.  If  we  take  un+i  =  Vi{u„  +  a2),  the  inequality  is  preserved  for 
Since  uH  as  defined  by  the  recurrence  relation  is  uniformly  bounded,  we  obtain,  in 
the  limit,  £(x,  y)  <  a2/(xr  +  yr)- 

Knowing  the  form  of  the  function,  we  readily  obtain  the  optimal  policy,  deriving  in 
this  case  the  slightly  paradoxical  result  that,  asymptotically,  as  x  and  y  — *  oo ,  it  makes  no 
difference  which  move  is  made  first. 

Collecting  the  above  results,  we  obtain 


V)  =  __f _ +  of  ) 

^  * y)  (i  -  p,k i  -  *.)  Ur  +  r) 


(3.68) 


3.9.  A  Mora  Ganaral  Problam 


We  have,  in  the  previous  sections,  considered  the  equations  resulting  from  situations 
in  which  two  choices  are  available  at  each  stage.  Let  us  now  discuss  a  three-choice 
problem,  as  represented  by  the  functional  equation 


f(x,  y)  =  Max 


‘•d:  pi Oi*  +  /Ch*,y)] 

B:  p2[r2y  +  f(x,s2y)~\ 

C-  pdr*x  +  r*y  +  fis **>  '«>)]_ 


(3.69) 
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where 

0  <  pu  p2,  p3  <  1 ,  0  <  <  1 ,  rj  +  j,  =  1,  /=  1,2,  3,  4. 

It  might  be  suspected,  on  the  basis  of  the  previous  results,  that  there  will  always  exist 
three  sectors,  RA,  RB,  R0,  as  pictured  in  Fig.  3.8,  which  determine  the  choice  of  the 
A,  B,  or  C  moves.  Unfortunately,  this  is  not  true  for  general  values  of  the  parameters, 
since  it  has  been  shown  that  there  is  an  equation  of  the  form  of  (3.69)  for  which  the 
decision  regions  are  as  shown  in  Fig.  3.9. 


From  the  fact  that  there  exists  a  problem  whose  solution  involves  four  decision 
regions,  it  follows  immediately  that  the  general  solution  of  the  multidecision  problem 
cannot  have  the  simple  form  of  the  solution  in  Section  3.2. 

At  the  present  time,  although  little  is  known  about  the  general  solution  of  the  ^-choice 
analogue  of  (3.69),  it  seems  fairly  certain  that  its  general  solution  will  possess  a  compli¬ 
cated  and  extremely  unintuitive  structure.  It  is  not  even  known  whether  or  not  there  is 
always  a  finite  number  of  regions  for  any  particular  equation,  and,  if  so,  whether  this 
number  can  be  arbitrarily  large  or  must  be  bounded  by  a  number  depending  on  k. 

We  shall  illustrate  a  number  of  partially  successful  approaches  by  considering  the  two 
equations  of  special  form 

~A:  p,\rtx  +  /(r,x,  y)] 

/(*.  y)  =  Max  B:  p2[r,y  4-  /(x,  j,y)]  (3.70) 

_C:  p:t[sx  +  sy  +  /(/x,  /y)]_ 

and 


/(*>  y)  =  Max 


x  +  /(tfx,  by)~ 
y  +  f{cy,dx)_  ‘ 


(3.71) 


Before  turning  to  a  discussion  of  these  equations,  let  us  note  that  equations  of  this 
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type  also  arise  in  connection  with  testing  problems  of  an  interesting  type. 

Consider  the  simplest  version,  in  which  we  are  given  the  information  that  a  ball  is  in 
one  of  N  boxes,  and  the  a  priori  probability,  pk,  that  it  is  in  the  ith  box.  Assuming  that 
each  observation  consumes  one  unit  of  time,  it  seems  intuitively  clear  that  we  look  in 
the  most  likely  box  first,  in  order  to  minimize  the  expected  time  required  to  find  the  ball. 
Note,  however,  that,  for  the  case  of  two  boxes,  if  we  are  merely  interested  in  determining 
which  box  contains  the  ball,  it  makes  no  difference  which  box  we  examine  first.  If, 
however,  we  want  to  obtain  the  ball  or  to  observe  it,  then  it  is  best  to  examine  the 
most  likely  spot. 

Let  us  now  consider  the  more  general  situation  in  which  observation  of  the  £th  box 
consumes  time  tk,  and  in  which  there  is  a  probability  qk  that  if  the  £th  box  is  observed, 
one  is  unable  to  examine  its  contents  or  to  obtain  them. 

Theorem  3.4.  If  we  wish  to  obtain  the  ball,  the  optimal  policy  is  to  examine  the  box 
for  which 


M1  ~  qk) 

*k 


(3-72) 


is  a  maximum. 

If  we  wish  merely  to  locate  the  box  containing  the  ball,  the  box  for  which  (3.72) 
is  a  maximum  is  examined  first  or  is  never  examined. 

More  interesting  and  difficult  problems  arise  in  situations  in  which  the  testing 
disturbs  the  probability  distribution.  For  a  two-box  model,  this  leads  to  functional 
equations  of  the  form 


f(pup2)  =  Min 


Pi  +  (1  -  />,)[!  +/(«  12,  <*22  >n 

p2  -b  (1  ^2)[f  "f"  f(aiu  <*21)]  J 


which  is  easily  resoluble.  However,  for  three  boxes  we  obtain 


KPi’PvPi)  =  Mjn  iPi  +  (l  -  Pi)U(Pu’ Pu> PtOH)’ 


(3.73) 


(3.74) 


where 


P» 

p'l 

P*1 


_  *i2p 2  +  a^pi 
1  -  pi 

_  aiip2  a22p3 

1  -  Pi 

_  a:wp2  d~  <tMpi 

1  -  Pi 


(3.75) 


and  so  on. 

Functional  equations  of  this  type  occur  frequently  in  the  theory  of  sequential  analysis, 
in  connection  with  problems  in  which  the  distribution  is  unknown  and  each  observation 
yields  additional  information  concerning  it. 
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3.10.  A  Simple  Three-choice  Equation 

Equation  (3.70)  may  be  written 


/(*>  y)  =  Max 


A:  +  f(stx,  y)] 

Mat  + /(*,  A>)] 

C:  />.,[>*  +  ry  +  //(*,  y)] 


(3.76) 


by  virtue  of  the  homogeneity  of  f(x,  y),  and,  as  a  consequence,  in  the  simpler  form 


f(x,  J )  =  Max 


A:  />,[»■,*  +  /(J,x,  y)] 
A:  Mat  + /(*,  at)] 
C:  Ml>  +  7] 

1  -  PJ 


(3.77) 


For  this  equation  we  can  prove 

Theorem  3.5.  If  0  <  rlt  s,  t,  pu  p2,  p3  <1,  r,  +  j,  =  1,  /Aere  are  at  most  three 
decision  regions. 

The  proof,  which  we  shall  merely  sketch,  is  more  interesting  than  the  result  and  is 
applicable  to  more  general  situations.  The  basic  idea  is  to  employ  a  continuity  method, 
using  an  appropriate  parameter,  which  in  this  case  is  s.  For  s  —  0,  there  are  actually  two 
regions,  as  we  know  from  the  previous  results.  It  is  now  not  difficult  to  show  that  as  s 
varies  between  0  and  1,  the  number  of  regions  does  not  exceed  three. 


3.11.  Tha  Equation  fix,  y)  =  Max  [x  +  flax,  by),  y  +  ficy,  dx)] 

As  another  example  of  an  equation  in  the  case  of  which  special  techniques  are  ap¬ 
plicable,  let  us  consider 


f(x,  y)  =  Max 


FA:  x  +  f(ax,  by)~\ 
y  +  /(cy,  dx)  J 


(3.78) 


where  we  shall  assume  that  0  <  a,  b,  c,  d  <  1 .  Under  these  conditions  we  know  that 
there  is  a  unique  solution.  Actually,  these  conditions  are  too  strong,  since  0  <  cd  <  1  is 
sufficient  to  ensure  existence  and  uniqueness. 

The  principal  result  we  shall  obtain  is 

Theorem  3.6.  All  optimal  strategies  are  periodic  from  some  point  on. 

Let  us  note  that  an  A-choice  sends  (x,  y)  into  (ax,  by),  and  that  a  B-choice  sends 
(x,  y)  into  (cy,  dx) .  We  observe  that  the  motion  induced  by  B*  sends  (x,  y)  into 
(cdx,  cdy),  or,  more  precisely,  if  the  optimal  policy  is  B3 0  (read  B3  optimal),  then 


/(*.  y)  =  (7  +  dx)  +  cdf(x,  y) .  (3.79) 

From  this  we  conclude  that  if  B* 0  is  an  optimal  policy,  then  B3 0  =  B  (C  denotes  the 
fact  that,  for  any  C,  the  sequence  of  moves  represented  by  C  is  repeated  periodically); 
and  in  fact  for  a  point  (x,  y)  where  this  takes  place, 


/(a  y )  =  /»(*.  y)  =  j —7-7 


i 


(3.80) 
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Next,  suppose  that  for  a  point  (x,  y)  the  optimal  strategy  has  the  form  AkBAkBO.  Then 


f(x,  y)  =  x(l  +  a  +  a*  + - 1-  a*-1)  +  V*y  +  f(clPy,  dtPx) 

1  -  4*  „  .  Lk  l-<*» 

=  *T=7  +  *y  +  c*>T=7 

4-  b^dtPx  4-  f^tPlAcdx,  tPlAcdy) 

=  P(x,y )  4-  SPcdfix.y). 


(3.81) 


From  this  we  conclude  that  if  AkBAkBO  is  optimal,  then  the  AkB  pattern  repeats 
periodically,  i.e., 


AkBAkBO  =  AkB. 

Similarly,  if  BAkBAk 0  is  optimal,  we  obtain 

BAkBAk  0  =  BAP. 


Also,  in  this  case, 

fi^(x,y)  =  (y  4-  Wr)^l  -  f  ^ 

4  —  cdtAlA 


(3.82) 

(3.83) 


(3-84) 


We  are  now  in  a  position  to  classify  completely  the  optimal  strategies.  First,  broader 
classifications  are  obtained,  and  from  these  obvious  eliminations  are  made  to  achieve  the 
final  list.  As  a  first  crude  classification  for  the  optimal  strategies  beginning  with  A, 


AO 


(1)  iA\ 

(2) 

(3)  [^50] 


(3-85) 


(we  put  brackets  around  a  strategy  when  no  further  subclassification  will  be  made  using 
this  form).  Considering  those  strategies  of  (3),  above,  which  are  not  in  (2),  we  have, 
since  B2  0  =  B, 


AO  =  A1  BO  -  AlBA0; 


then. 


AlBA0  - 


f  (3/) 

1(3") 


[A*BA], 

AlBAkB0. 


(3-86) 


Next,  in  case  (3")  we  have  two  cases  according  as  /  >  k  or  /  <  k: 

Case  3"  (/  >  k ) 

In  this  case  we  have,  if  AO  is  optimal,  AO  =  AlBAkB0  =  Al-k(AkBAkB0') ;  and  since 
at  the  state  reached  after  l  —  k  applications  of  A,  AkBAkB0  would  be  optimal,  the  above, 
together  with  (3.82),  implies  that 

AO  =  A'-* APB  =  [AlBAk,  l  >  i]  . 


(387) 
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Case  3"  (/  <  k ) 

This  case  leads  to  three  subcases: 


AO  -  AlBAkB0  - 


(3") 

(3") 

(3'/) 


AlBAkBA, 
A‘BAkBA'B0, 
[A>BAkB] . 


Subcase  3" 

This  subcase  implies  that,  since  l  <  k. 


AO  =  Al(BAkBAk0)  =  [ A‘BAk ], 


(3.88) 


(389) 


via  (3-82). 

Subcase  3" 

In  this  subcase  we  again  have  two  cases,  according  as  k  >  r  or  k  <  r. 

If  k>r,  we  have  AlBAkBArB0  =  AlBAk  r(ArBArB0)  = 
fl/dr,  /§  >  Max  (/,  r)].  On  the  other  hand,  if  k  <  r,  AlBAkBArB0  —  Al(BAkBAk0)  = 
[A'BA*]. 

Collating  the  classification  carried  out  above,  we  see  that  a  list  which  includes  all 
optimal  strategies  beginning  with  A  is 


(a,) 

A, 

(a.) 

A’B,  / 

=  1,2,  •••, 

(a:l) 

A1  BA, 

1=1,2, 

(a.) 

AlBAk, 

l=l,2,--; 

k  =  1,  2, 

(a,) 

A  lBAkB, 

1  <l<k. 

(a«) 

AlBAkBAr, 

k  >  Max  (/ 

,  r)  >  1 . 

(3.90) 


Next  we  consider  optimal  strategies  beginning  with  B.  It  is  quite  clear  that  either 
BO  =  B  or  BO  =  BA 0.  Thus,  we  see  immediately  from  (3.89)  that  a  list  of  possible 
optimal  strategies  beginning  with  B  is 


(bO 

B, 

(b2) 

BA‘B, 

1=1,2,-- 

(b3) 

BA1, 

/=  1,2,  •• 

(b») 

BAlBA3, 

k<l. 

(b5) 

BA. 

Although  it  is  now  possible  to  obtain  the  decision  regions  explicitly  by  computing  the 
results  of  the  allowable  optimal  strategies,  the  amount  of  effort  required  is  so  great  that 
another  technique  is  employed.  In  place  of  this  approach,  a  combination  of  the  geometric 
treatment  discussed  in  the  next  section,  together  with  the  analytic  approach  already  em- 
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ployed,  yields  the  information  concerning  the  number  of  connected  decision  regions.  Since 
the  proof  requires  a  detailed  investigation,  we  shall  omit  it  here. 

3.12.  An  Illustration  of  the  Geometric  Approach 

Let  us  consider  the  equation 


/(x,  y)  =  Max 


ax  +  rf(bx ,  y) 
cy  +  sf(x,  dy) 


(3.92) 


and  obtain  its  solution  by  using  the  geometric  techniques  discussed  in  Section  2.8.  As  we 
know,  the  solution  will  be  to  employ  B  whenever 


y  ^  d(l  -  s ) 
x  ~  f(  1  -  r) 


(3.93) 


and  to  employ  A  whenever  the  reverse  inequality  holds,  assuming,  as  we  shall,  that 
0  <  a,  b,  c,  d,r,  s  <  1 . 

To  prove  this  result,  we  consider  first  the  set  of  all  strategies  of  the  form  ABT  and 
construct  their  subenvelope  from  above,  EAH  —  En vtL(ABT).  Similarly,  we  form 
EBA  =  EnvK  L(BAR).  Then  let  E  =  Env  {EAB,  EBA).  For  a  given  strategy  T, 

1abt(x,  y)  =  «x  +  rcy  +  rsf(bx,  dy) , 

f bat{x,  y)  =  cy  4-  sax  +  rsj{bx,  dy)  ,  (3.94) 


so  that  L{ABT)  and  L(BAT)  intersect  at  the  normalized  point  y,  corresponding  to 

y/x  —  a(l  —  s)/c(  1  —  r),  x  +  y  =  l.  (More  precisely,  y,  =  a(l  —  s)/[c(l  —  r) 

+  d(  1  — r)].)  Note  in  particular  that  y  is  independent  of  the  choice  of  strategies. 
Furthermore,  for  y  >  yu  L(BAT)  lies  below  L(ABT).  Thus,  E  consists  of  E  u,  for 
y  <  y,  and  of  Eba  for  y  >  y,.  Hence,  with  respect  to  the  strategies  included  in  the  sub¬ 
envelope  E,  A  is  an  optimal  initial  choice  to  the  left  of  y,,  and  B  is  optimal  to  the  right 
of  yt.  To  complete  the  proof  of  the  theorem,  we  need  only  show  that  this  property  is 
preserved  after  we  pass  to  the  full  envelope  by  taking  the  envelope  of  E  and  the  lines  of 
the  strategies  not  yet  considered.  These  lines  are  of  the  form  L(AkBT),  L(BkAR),  k  >  1; 
L(A°°)  and  L(B“). 

If  a  line  L(AkBT),  k  >  1  touches  the  envelope  E  at  a  point  y,„  L(ABT)  also  touches 
the  envelope  and  to  the  right  of  y0,  since  A  transforms  the  decision  at  (x,  y)  into  one  at 

(Ax,  y),  and  the  normalized  (Ax,  y)  is  larger  than  the  normalized  (x,  y).  Thus,  if 

y0  >  y„  L(ABT)  would  touch  E  to  the  right  of  y„  which  is  impossible,  since  L(ABT) 
lies  below  the  subenvelope  £  for  y  >  y,.  A  symmetric  argument  disposes  of  the  L(BkAR), 
k  >  X.  As  for  the  two  lines  L(/4“)  and  L(B°°),  these  are  limits  of  the  L(AkBT), 
L(BkAR),  respectively,  as  k— *  oo,  and  they  can  in  no  way  affect  the  ultimate  envelope  E. 
In  fact,  clearly  Ax  is  optimal  only  at  y  =  0,  and  B°°  at  y  =  1.  This  completes  the  proof  of 
the  theorem. 

The  problem  considered  above  has  a  finite  analogue,  as  discussed  in  Section  3  5,  whose 
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solution  constitutes  a  more  precise  form  of  the  above  result.  Namely,  letting  N  — 
1,  2,  ■  •  ■ ,  we  consider 


/(N,  x,  y)  =  Max 


ax  -f  rf(N  —  1,  bx,  y)l 
cy  +  r/(N  —  \,x,dy)  J’ 


(395) 


where  /(0,  x,  y)  ==  0.  Determining  /(N,  x,  y)  for  each  N  reduces  to  considering  strate¬ 
gies  jv  of  N  steps,  each  either  A  or  B,  and  calculating  /Sj((N,  x,  y)  via  (3-95)  and  then 
determining  which  of  these  is  largest.  Thus,  for  each  point  (x,  y),  f(N,  x,  y)  is  the  maxi¬ 
mum  of  2N  numbers.  Again  we  wish  to  characterize  the  optimal  initial  choice,  which  now, 
of  course,  depends  on  N. 

Theorem  3.7.  There  exists  an  N0  —  N0(b,  d,  r,  s)  such  that  for  N  >  N0,  the  optimal 
initial  choice  for  (N,  x,  y)  is  A  if  y  <  y,  and  B  if  y  >  y,.  More  precisely,  we  may  take 


where 


e  =  (!->•  +  *d)(\  -  s ) 

1  —  r  ’ 

(»  -  j +  f*)(i  -o 

1  -  r 

Furthermore,  this  is  best  possible  in  the  sense  that  given  a,  b,  c,  d,  r,  there  always  exist 
values  of  s  such  that  (1)  N0  is  as  large  as  we  please;  and  (2)  for  all  N,  2  <  N  <  N0 
there  are  points  to  the  right  of  y,  =  y,  (j)  at  which  A  is  an  optimal  initial  choice  for  an 
N-step  strategy. 

Proof.  For  N  —  1  it  is  clear  that  one  always  chooses  A  at  (x,  y)  if  ax  >  cy,  and 
chooses  B  otherwise.  For  N  —  2  we  see  that  L(AB)  and  L(BA)  intersect  at  yx;  and  thus, 
for  N  =  2  the  optimal  initial  choice  would  be  A  to  the  left  of  y*x  and  B  to  the  right,  ex¬ 
cept  possibly  for  interference  from  the  strategies  A2  and  B2.  That  is,  it  may  be  that  A2 
appears  in  this  "2-envelope”  to  the  right  of  y,,  as  in  Fig.  3.10.  Since  L(A2)  is  above 
L(AB )  at  y  =  0,  this  would  mean  that  L(AB)  is  completely  dominated. 

Also,  the  intersection  of  L(A ’)  and  L(BA)  that  occurs  at  y2  is  to  the  right  of  y,.  This 
is  numerically  equivalent  to 


s  +  rb)> 


r(l  -  ry 


(3.97) 


The  symmetric  situation  for  B2,  i.e.,  L(B s)  intersecting  L(AB)  to  the  left  of  y,,  entails 


c. 
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Since  b  and  d  lie  between  0  and  1,  both  of  these  phenomena  cannot  occur  simultaneously. 
For  convenience  we  shall  suppose  henceforth  that  if  one  of  them  occurs,  it  is  for  A2,  i.e., 
b  >  (1  —  s)/(l  —  r ),  so  that  L(A2)  dominates  L(AB )  as  in  Fig.  3.10. 


We  now  proceed  to  consider  any  fixed  N  >  3.  The  subenvelope  of  the  lines  of  the 
N-strategies  of  the  forms 

AkBSN-k-\>  B*ARn^.u  k>l,  (3-98) 

is  again  separated  into  two  parts  by  yx  such  that,  insofar  as  these  strategies  are  concerned, 
A  is  the  optimal  initial  decision  for  y  <  y,,  and  B  for  y  >  yt.  There  remains  then  to 
account  for  the  two  strategies  AN  and  Bs.  If  Bs  appears  in  this  N-envelope  to  the  left  of 
y,,  B2  will  appear  in  the  2-envelope  to  the  left  of  y,,  which  is  not  the  case  (according  to 
arrangements  made  above).  Thus,  Bs  does  not  alter  the  character  of  the  initial  decisions 
as  determined  by  y,.  The  last  possibility  to  consider  is,  Does  Ay  appear  in  the  envelope 
at  a  point  to  the  right  of  y,  ?  If  it  does,  then  AK~2  moves  this  point  to  one  farther  to  the 
right,  at  which  A2  appears  in  the  2-envelope.  This  last  point,  of  course,  must  then  be  to 
the  left  of  y.,.  Translating  these  statements  into  numerical  terms  yields 


1  gO  ~  -0  ^  ±  ri 

b*-2c(  1  —  r)  -  c  '■ 


s  +  rb) , 


or 


<  (1  -  t  +  rt)(l  -  r)  _  ' 


1  —  s 


(3.99) 


(3.100) 


If  e  <  1,  then  this  is  impossible,  and  we  may  take  the  N0  of  the  theorem  equal  to  2.  On 
the  other  hand,  if  e  >  1,  we  have  N  <  2  +  (loge)/[log  (1/A)].  Thus,  in  general,  re¬ 
moving  all  asymmetries,  we  may  take 


N„  =  Max 


2,2  +  1281,2+^ 

logi-  log- 


r 


I 


M 


(3.101) 
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where 

/  _  (1  ~  r  +  sd)(  1  -  j) 

1  —  r 

To  demonstrate  the  "best  possible"  part  of  the  theorem,  we  observe  that  given  a,  b,  c, 
d,  r  and  taking  s  close  to  1,  e,  and  consequently  N0,  can  be  made  as  large  as  we  please. 
We  may  ensure  in  the  following  manner  that  there  are  points  to  the  right  of  y,  at  which 
A  is  an  optimal  initial  choice  for  N  <  N0. 

We  first  arrange  for  L(B2)  to  intersect  L(BA )  to  the  right  of  y2.  This  is  equivalent 
to  a/c(  1  —  s  +  rb )  <  a/cd  or  l/d  >  1  —  s  +  rb.  This  last,  of  course,  is  satisfied  for  s 
sufficiently  close  to  1.  Then,  for  N  =  2,  A2  dominates  all  BAT  to  the  left  of  y2,  and, 
moreover.  A2  dominates  all  BS  to  the  left  of  y2,  where  y2  >  yx.  Now  assume  k  —  1 
<  N0  —  1,  so  that  A-<k~1)y2  =  y*  >  ylt  and  assume  also  that  ( 1  )fc-i :  Ak-*  dominates  all 
ABTk_3  to  the  left  of  yk,  and  that  (2)k^:  A*-1  dominates  all  BTk~2  to  the  left  of  yk.  We 
observe  that  under  the  conditions  provided  for  earlier,  for  k  >3,  (l)*-i  implies  (2)s_,. 
We  now  wish  to  complete  the  induction  by  deriving  (1)*.  That  is,  we  must  verify  that  Ak 
dominates  all  ABTk_2  to  the  left  of  yk+1  =  A~1(yk).  This,  however,  is  equivalent  to  Ak  ' 
dominating  all  BTk_2  to  the  left  of  Ayk+1  =  yk,  which  is  precisely  (1)*.,.  Since  k  <N0, 
yk+ !  >  yu  and  the  induction  is  completed. 


CHAPTER  4 

THE  FUNCTIONAL  EQUATION 
fbc)  =  Max  {g(y)  +  hix  —  y)  +  flay  +  b(x  —  y)] } 

o<V<r 

AND  RELATED  TOPICS 


4.1.  Introduction 


In  this  chapter  we  shall  study  a  number  of  equations  possessing  a  common  structure. 
Since  these  equations  are  intractable  analytically  unless  we  make  some  simplifying  assump¬ 
tions,  we  shall  devote  our  efforts  to  showing  that  certain  simple  hypotheses  yield  a  number 
of  interesting  and  important  results. 

We  shall  begin  with  a  discussion  of  the  equation 

/(*)  =  Max  {g(y)  +  h{x  -  y)  +  f[ay  +  b(x  -  y)]},  (4.1) 

whose  origin  is  described  in  Section  1.6,  Problem  1.8,  and  continue  with 

fk(*,  y)  =  Max  [>*<Hy  +  Otz)  +  (1  -  />*)/*+>(*  -  z,  y  +  <■**)],  (4.2) 

•<«<» 

devoting  some  time  to  presenting  a  simple  dynamic  programming  problem  that  gives  rise 
to  the  above  functional  equation. 

After  this  we  shall  turn  our  attention  to  the  equation 


i 


f(x )  =  Max  g(y )  +  b{x  -  y) 


and  to  the  equation  of  optimal  inventory 

u(x)  =  Min  |g(y  —  x)  +  a[M  -f- 
»>*  l 


+ 


*(0)][1  -F(y)] 
a  f  u(y  -  s)F\s)  ds\. 


(4-3) 


(4.4) 


discussed  in  Section  1.8.6. 

4.2.  The  Equation  fix)  =  Max  (gly)  +  hlx  —  y)  +  f[oy  +  b(x  —  y)]} 

o  <y<x 

Let  us,  in  this  section,  consider  the  functional  equation 

fix)  =  Max  (g(y)  +  h{x  -  y)  +  flay  +  K*  ~  >) ] } .  /(°)  =  ° 

oCy<x 

(4.5) 
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We  shall  begin  our  discussion  of  (4.5)  by  proving 
Theorem  4.1.  If 


(a) 

g(  0)  =  A(0)  =  0, 

(b) 

g'(x),h'(x)  >  0, 

g"(x),  h"(x)  >  0,  for  x  >  0, 

(c) 

i(cnx)  <  oo , 

h(cHx)  <  oo  , 

(4.6) 

where  c  ~  Max  [a,  hj,  then  the  optimal  policy  consists  in  choosing  y  —  0  or  x  at 
each  stage. 

Proof.  Let  f(x)  be  defined  as  above,  and  define,  in  addition, 


/.v(x)  =  return  obtained  using  an  optimal  policy  when  only  N  stages  are 

allowed.  (4-7) 


We  feve 


/,(*)  =  Max  [g(y)  +  h(x  -  y)],  (4.8) 

0<*<* 

and,  generally, 

/v.iM  =  {<?(7)  +  H*  -  7)  +  M<*7  +  K*  -  7)]}.  (4-9) 

0<v<x 

for  N  >  1. 

As  we  know  from  Section  2.3  of  the  chapter  on  existence  and  uniqueness  theorems,  the 
limit  of  /y(x)  as  N  oo  is  /(x),  the  unique  solution  to  (4.5).  Let  us  now  demonstrate 
that  the  hypotheses  of  (4.6)  yield  the  result  that  /  v(x)  is  monotone  increasing  and  convex. 

To  establish  the  result  for  N  =  1,  we  observe  that  the  convexity  of  g  and  h  yields  the 
convexity  of  g(y)  +  h(x  —  y)  as  a  function  of  y  in  [0,  x].  Hence,  the  maximum  is  at¬ 
tained  at  an  end  point  and 

/,(x)  =  Max  [g(x),  h(x)],  (4.10) 

which  is  monotone  increasing  and  convex.  Let  us  now  argue  inductively.  If  the  result  has 
been  established  for  N,  it  follows  that  g(y)  4-  h(x  —  y)  +  fn[ay  +  b(x  —  y)]  is  con¬ 
vex  and  thus  that  its  maximum  occurs  at  y  =  0  or  x.  Hence, 

/jv*i(*)  =  Max  [g(x)  +  jN(ax),  A(x)  -f  /*(f>x)],  (4.11) 

which  shows  that  fNtl(x)  is  monotone  increasing  and  convex. 

Letting  N  — >  oo ,  we  see  that 

/(x)  =  Max  [g(x)  +  f(ax),  A(x)  +  /(f>x)]  ,  (4.12) 

which  shows  that  the  optimal  policy  is  to  choose  y  =  0  or  x.  Alternatively,  we  could  use 
the  convexity  of  /(x),  obtained  as  a  limit  of  convex  functions  /w(x),  to  establish  (4.12) 
directly  from  (4.5). 

Prior  to  a  further  study  of  (4.12),  we  shall  establish  a  similar  result  for  the  equation 
in  (4.6). 
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4.3.  A  Result  Concerning  Equation  (4.2) 

Consider  the  following  problem:  We  are  given  initially  x  dollars  and  a  quantity  y  of  a 
serum,  together  with  the  prerogative  of  purchasing  additional  amounts  of  the  serum  at 
specified  times  /,  <  /2  <  •  •  •  .  At  the  ith  purchasing  opportunity,  4,  a  quantity  ckz  of 
serum  may  be  purchased  for  z  dollars,  where  ck  is  a  monotone-increasing  function  of  k. 
Given  the  probability  that  an  epidemic  occurs  between  tk  and  4+1,  and  the  condition  that 
if  an  epidemic  occurs  we  may  only  use  the  amount  of  serum  on  hand,  the  problem  is  to 
determine  the  purchasing  policy  that  maximizes  the  over-all  probability  of  successfully 
combating  an  epidemic,  given  the  probability  of  success  with  a  quantify  w  of  serum 
available. 

The  condition  ck  >  ck^  is  imposed  to  indicate  the  cheaper  cost  of  serum  at  a  later 
date  because  of  technological  improvement.  Let 

pk  =  probability  that  the  epidemic  occurs  between  tk  and  4+n  assuming 
that  it  has  not  occurred  previously, 

4>(w')  —  probability  of  combating  the  epidemic  successfully  with  a  quantity 
w  of  serum, 

fk(x,  y)  =  over-all  probability  of  success  using  an  optimal  purchasing  policy 
from  4  on,  given  x  dollars  and  a  quantity  y  of  serum  on  hand. 

(4.13) 

Invoking  the  principle  that  an  optimal  policy  must  possess  an  optimal  continuation 
after  any  initial  action,  we  obtain  in  the  usual  manner  the  functional  equation 

/*(*>  y')  =  Max  [ pk<Ky  +  42)  +  (1  —  pk)fk+i(x  -  z,y  +  CfcZ)]  .  (4.14) 

0<Z<if 

In  order  to  state  the  following  result  in  simple  form,  let  us  assume  that  pk  —  p.  We 
have  then 

Theorem  4.2.  If 

(a)  0(0)  -  0, 

(b)  0(m ')  is  monotone  increasing  and  convex,  for  all  values 

of  w  that  occur,  (4.15) 

then  the  optimal  policy  consists  in  purchasing  no  serum  at  /„  t2,  ■  ■  ■  ,  4-i  and  in  using  all 
available  money  at  4,  where  k  is  chosen  so  as  to  maximize 

[1  -  (1  -  /O*-1]0(y)  +  (1  —  +  4*)-  (4.16) 

The  proof  is  obtained  in  very’  much  the  same  manner  as  above,  employing  the  function 

7)  —  over-all  probability  of  success  using  an  optimal  purchasing  policy 
from  4  on,  given  x  dollars  and  a  quantity  y  of  serum  on  hand 
and  exactly  n  subsequent  purchasing  times,  (4.17) 

which  satisfies  the  functional  equation 
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/*.»(*.?)  =  Max  +  Ot*) 

0<S<X 

+  (!  —  p)fk+l,n- 1(*  —  z.  y  +  *■**)]  ■  (418> 

The  case  in  which  this  convexity  property  fails  is  much  more  difficult  to  resolve  and 
will  usually  require  a  purchase  of  a  certain  quantity  of  serum  at  each  purchasing  point. 
A  simpler  problem  of  this  type  is  discussed  in  more  detail  in  Section  4.5. 


4.4.  The  Equation  fix)  =  Max  [g(xl  +  flax),  hlx)  +  flbxl] 

Let  us  now  turn  to  a  discussion  of  the  equation 

/(*)  =  Max  [g(x)  +  /(dx),A(x)  4-  /(Ax)].  (449) 

It  is  difficult  to  obtain  any  analytic  representation  for  the  function  / (x)  or  any  descrip¬ 
tion  of  the  optimal  policy,  unless  one  makes  some  further  assumptions  concerning  g  and  b. 
We  shall  pursue  the  analysis  to  the  point  where  these  assumptions  are  required  and  then 
illustrate  the  genera!  method  of  attack  by  proving  the  following  result: 

Theorem  4.3.  The  solution  of 


f(x)  =  Max  [cxd  4-  / (ax),  ex'  +  /(Ax)],  /( 0)  =  0, 

subject  to 

(a)  0  <  a,  b  <  l ,  c,  d  >  0, 

(b)  0  <d<f, 

is  given  by 

/ (x)  =  cxd  +  f(ax),  0  <  X  <  x„ 

=  ex'  +  /(Ax),  x„  <  x, 

where 


c  “|1  /</-<») 


1  -  Ad 


(4.20) 


(4.21) 


(4.22) 


(4-23) 


We  shall  represent  by  A  the  operation  of  choosing  g(x)  4-  /(<«■)  and  by  B  the  opera¬ 
tion  of  choosing  A(x)  4-  /(Ax).  A  solution  corresponding  to  an  optimal  sequence  of 
choices  may  be  represented  symbolically  by 


S  =  Aa,B\A°,Bb ,  ■  •  • ,  (4.24) 

where  the  and  Af  are  positive  integers  or  zero. 

We  suspect  from  our  previous  work  that  the  x  values  where 

AB  4-  optimal  continuation  =  BA  4-  optimal  continuation  (4.25) 

will  play  an  important  role  in  determining  the  solution.  If  A  and  then  B  is  used, 
we  obtain 
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/(*)  =  g(x)  +  *>(ax)  +  f{abx),  (4.26) 

while  B  and  then  A  yields 

/(*)  =  K *)  +  g(.bx)  +  f(abx)  .  (4.27) 

The  equation  corresponding  to  (4.21)  is  then 

g(x)  +  h(ax)  =  h(x)  +  g(bx).  (4.28) 

Let  us  now  make  the  simplifying  assumption  that  this  equation  has  exactly  one  non¬ 
zero  solution,  x,  and  that  AB  >  BA  for  0  <  x  <  x  and  that  AB  <  BA  for  x  >  x.  With¬ 
out  some  condition  of  this  type  it  seems  very  difficult  to  obtain  a  general  solution. 

Let  us  now  show  that  either  A°°  or  B°° — that  is  to  say,  A  or  B  repeated  indefinitely — 
is  the  optimal  sequence  in  [0,  x] .  Let 

St  =  BbiAaiBbi  ••  • ,  (4.29) 

be  an  optimal  sequence  for  some  x  in  [0,  x].  This  may  be  written 

Bbr>(BA)A“,-'Bb,  •  •  •  .  (4.30) 

Since  the  result  of  applying  B  is  to  decrease  x,  after  (bt  —  1)  applications  of  B,  the  point 
x  will  still  be  in  the  interval  [0,  x] .  In  this  interval,  BA  plus  optimal  continuation  is  in¬ 
ferior  to  AB  plus  optimal  continuation.  It  follows,  therefore,  that  5)  is  majorized  by 

S2  -  Bbr'(AB)Aa,'Bb,  •  •  • .  (4.31) 

If  b-i  —  1  0,  we  may  continue  in  this  way  until  we  arrive  at  an  optimal  sequence  for 

which  A  is  a  first  move,  provided  that  b2  is  not  oo ,  which  is  equivalent  to  saying:  pro¬ 
vided  that  A  is  used  at  all. 

We  see  then  that  A  is  either  used  first  or  not  at  all.  It  follows  that  it  is  only  necessary 
to  compare  B°°  and  A*B°°,  of  which  a  special  case  is  A°°,  in  [0,  x].  The  return  corre¬ 
sponding  to  Bx  is 

H(x)  =  A(x)  +  h(bx )  -| - ,  (4.32) 

whereas  AkB°°  yields 

Gk(x)  =  g(x)  d - h  g(J-'x)  +  tf(**x).  (4.33) 

If  for  0  <  x  <  x,  we  have  H(x)  >  g(x)  +  H(<«x)  =  G1(x),  then  clearly  H(x)  > 

g(x)  +  [£(**)  +  H(<isx)]  =  C2(x). 

In  order  to  continue,  we  must  now  make  an  assumption  concerning  the  solutions  of 
H(x)  =  g(x)  +  H(ax)  and  similarly  of  the  equation  G(x)  =  h(x)  +  G(bx').  Impos¬ 
ing  the  condition  that  there  are  unique  non-zero  solutions,  and  proceeding  by  a  systematic 
enumeration  of  cases,  we  may  obtain  the  solution  to  (4.19).  In  place  of  a  detailed  account 
of  the  results  in  the  general  case,  let  us  consider  the  simpler  equation  represented  by 
(4.20).  The  equation  AB  =  BA  takes  the  simple  form 
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c: 
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whose  unique  non-trivial  solution  is  x  =  x0,  as  given  by  (4.23).  The  equations  H(x) 
=  g(x)  +  H(ax),  G(x)  =  h(x)  +  G(bx)  turn  out  to  be  equivalent  to  (4.34). 

Since  we  have  assumed  /  >  d,  it  is  clear  that  A  will  be  the  best  move  for  small  x, 
where  0  <  x  <  x2.  If  xs  <  x  <  x0,  and  close  enough  to  x2  so  that  A  and  B  both  trans¬ 
form  it  into  the  interval  [0,  x0],  it  is  clear  that  B  at  x  implies  that  BA°°  will  be  the  opti¬ 
mal  sequence,  whereas  A  implies  A *.  Since  BA°°  <  A°°  in  [0,  x„],  it  follows  that  B  is 
not  chosen  at  x.  Continuing  this  procedure,  we  see  that  A  is  used  throughout  [0,  x0].  In 
exactly  the  same  way  we  see  that  B  is  chosen  for  x  >  x0. 

A  result  of  this  type  is  useful  for  approximation  purposes,  since  an  increasing  func¬ 
tion  of  reasonably  smooth  growth  can  be  approximated  to  some  degree  of  accuracy  by 
cx*.  Approximation  of  g(x)  by  cxi  is  equivalent  to  approximation  of  log  g(x)  by  log  c 
+  d  log  x,  and,  finally,  to  log  g(eu)  by  a  straight  line  c,  +  dxu. 

Let  us  point  out,  finally,  that  the  change  of  variable 

x  =  *“,  /(*“)  =  £(»),  (4.35) 

converts  (4.19)  into  the  form 

£(»)  =  Max  [g,(»)  +  .£(*  -  «0>  ^i(»)  +  -  *01 . 

<(>(—  oo)=0,  »>— oo,  (4.36) 

which  is  also  an  equation  of  an  interesting  type. 

It  would  be  of  some  interest  to  determine  the  simplest  conditions  upon  g  and  b 
which  would  ensure  that  an  optimal  policy  always  has  the  simple  form  shown  above. 

4.5.  The  Functions  g  and  h  Both  Concave 

Let  us  now  return  to  the  equation 

/(x)  =  Max  {g(y)  +  b(x  -  y)  +  f[ay  +  b(x  -  y)]},  /( 0)  =  0 

o<y<* 

(4-37) 

and  assume  that  g  and  h  are  both  concave  increasing  functions  of  x.  The  problem  is  now 
much  more  complex,  and,  in  general,  the  optimal  y  will  not  be  at  end  point. 

We  shall  prove 
Theorem  4.4.  Let 


(a) 

g(0)  =  h{  0)  =  0, 

(b) 

g'(x),  h\x)  >0 

forx  >  0, 

(c) 

g"(x),  h"(x )  <  0 

for  x  >  0 , 

(4.38) 

and  consider  the  sequence  of  approximations  to  f  defined  by 

/o(x)  =  Max  [g(y)  +  A(x  -  y)] 
o<*<* 

/»«(*)  =  Max  (g(y)  4-  h(x  -  y)  +  fn[ay  +  b{x  -  y)]}, 

n  =  0,1,2,--.  (4.39) 
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For  each  n,  there  is  a  unique  yn  =  y„(x)  that  yields  the  maximum.  If  b  <  a,  we  have 
<  >s  •  •  • ,  and  the  reverse  inequalities  for  b  >  a.  In  particular,  if  y„(x)  =  x 
for  some  n  in  the  case  b  <  a,  then  ym  (x)  =  x  for  m  >  n,  and  the  solution  of  the 
original  equation  in  (4.37)  will  be  furnished  by  y  =  x. 

This  result  is  important  in  connection  with  determining  approximate  solutions,  since 
it  is  quite  simple  to  determine  numerically  y,,  y2,  and  even  y3. 

We  shall  begin  by  assuming  that  all  the  maxima  occur  within  the  interval  [0,  x]  and 
shall  then  consider  the  case  in  which  one  yn(x)  =  x.  Considering  the  function  /,(x), 
we  see  that  its  maximum,  y,  is  determined  by  the  equation 

g'(y)  =  h'(x-  y).  (4.40) 

Since  the  left-hand  side  is  monotone  increasing  and  the  right-hand  side  is  monotone 
decreasing,  there  is  at  most  one  solution.  If  we  assume  h'(x)  >  g'( 0),  g'(x)  >  h'( 0), 
there  will  be  exactly  one  solution  of  (4.40),  which  we  call  y,  =  y,(x).  Differentiating 
(4.40),  we  obtain 


tf£"(7i)  =  (l  -/{)*"(*-*)> 

(4.41) 

which  yields 

/  _  h"(x  -  ?•)  -.0 
g"(y .)  +  b"(x  -  y.)  >  ’ 

(4.42) 

and 

i  -  /,  >  o. 

(4.43) 

Turning  to  the  expression  for  f,  we  have 

/.(*)  =  g(yO  +  b(x  -  y,), 

(4.44) 

whence 

f'i(x)  =  g'(ji)/i  +  (1  -  fi)b'(x  -  y.)  =  h’(x  -  y,), 

(4.45) 

using  (4.40).  Thus,  f\(x)  >  0  and  /('(x)  —  (I  —  y\)h"(x  —  y,)  <  0,  which  means 
that  /,(x)  is  concave. 

Let  us  now  turn  to  the  function  /2(x), 


f2(x)  =  Max  {g(y)  +  h(x  —  y)  +  ft\ay  +  b(x  —  y)]}.  (4.46) 

o<y<x 

Assuming  that  there  is  a  maximum  inside  the  interval,  we  obtain 

g'(y )  “  b'(x  —  y)  +  (a  -  b)f\ [ay  +  b(x  -  y)]  =  0,  (4.47) 

which  we  write 

g'(y)  +  (a  —  b)f\[ay  +  b(x  —  y)]  =  h’(x  -  y) .  (4.48) 

The  left-hand  side  is  again  strictly  decreasing  and  the  right-hand  side  strictly  increasing, 
so  that  there  is  at  most  one  solution  which  we  call  y2  =  y2(x),  if  it  exists.  Note  that  if 
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there  is  no  solution  of  (4.48),  then 

M*)  =  g(x)  +  ft(ax)- 

Let  us,  however,  assume  that  there  is  a  solution.  Then, 

/iW  =  g^)  +  h(x  -  y2)  +  fi[ay2  +  b{x  —  y2)], 
whence,  as  above,  using  (4.48), 

f2(x)  =  h>{x  -  y2)  4-  bf\\ay2  4-  b(x  -  y2)] . 
Using  (4.48)  again,  this  may  be  written 

4,  _  ah'(x  -  yi)  -  bg'Cyz) 


(4.49) 


(4.50) 


(4.51) 


(4.52) 


This  procedure  is  perfectly  general,  and  we  obtain,  under  our  assumption  concerning  the 
existence  of  an  internal  maximum, 


j,  _  ‘•b'ix  -  yn )  -  bg'Cyn) 
U  a  -  b 


n  =  1,  2,  3, 


(4.53) 


We  now  wish  to  show  that  if  b  <  a,  then  yt  <  y2  <  •  •  • ,  and,  conversely,  if  a  <  b, 
that  yt  >  y2  >  •  •  • .  The  two  cases  are  really  one,  since  we  may  interchange  the  roles  of 
y  and  x  —  y  if  we  so  wish.  Since  j\  >  0,  we  see,  on  comparing  (4.48)  and  (4.40),  that 
yi  <  y2- 

The  equation  for  y3  is 


i'(y)  +  (*-  b)f'2[ay  4-  b(x  -  y)]  =  b\x  -  y) . 


(4.54) 


If  we  can  show  that  f'2(x)  >  /'(x),  the  same  argument  as  that  for  yu  y2  shows  that 
y3  >  y2.  Comparing  (4.45)  and  (4.51),  we  see  that  f2  >  flt  since  h’(x  —  y2)  > 
h’<,*  -  yi)- 

To  obtain  the  result  for  general  »,  always  assuming  that  the  maxima  occur  at  inner 
points,  we  use  (4.53).  We  know  that  f'„(x)  >  f'n-i(x)  implies  that  yn+1  >  y„.  Since 
the  function 


r( ah’(x  -  y)  -  bg'iy) 

a  -  b 


(4.55) 


is  monotone  increasing  in  y  and  y„  >  yn-u  via  an  inductive  hypothesis,  it  follows  that 
/»  >  /n-i  and  thus  that  yntl  >  yn. 

Let  us  now  consider  the  situation  in  which  some  y„(,x)  =  x.  If  n  =  1,  it  is  easy  to 
see  that  yn(x)  =  x,  n  >  1,  since  yi(x)  =  x  means  that  g*(y)  >  h'(x  —  y)  for  0  <  y 
<  x.  Since 


gy  {^(>)  +  h^x  —  y)  +  fx{.*y  +  Kx  ~  ?)]} 

=  g'O )  -  h'(x  —  y)  +  (a  —  b)j\  [ay  4-  b(x  -  y)J 


(4.56) 


and  a~>  b,  we  see  that  this  expression  is  positive  if  g'(y)  >  h'(x  —  y)  for  0  <y<x 
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Hence,  ya(x)  =  x,  and,  similarly,  y„(x)  =  x. 

Let  us  now  take  the  case  in  which  y2(x)  =  x,  y^x)  x.  Since  y2(x)  =  x  implies 
that  g'(y)  —  h’(x  —  y)  +  (a  —  b)f'[ay  +  b(x  —  y)]  >  0  for  all  0  <  y  <  x,  we 
have,  in  particular,  g'(x')  —  h'(0)  +  (a  —  b)f't(ax)  >  0.  Since 

/£(*)  =  ?(*)  +  afi(ax) 

=  ft*)  ~  A'(O)  +  (a  -  b)f1(ax)  +  h'{  0)  +  bf^ax) 

>  h'{ 0)  (4.57) 

and  f[(x)  =  h'{x  —  y,)  <  b'(0),  we  see  that  f'2(x)  >  /'(x).  This,  as  above,  implies 
that  y3  >  y2  =  x,  and  the  process  continues. 

Let  us  note,  finally,  that  if  g'(y)  >  b'(x  —  y)  for  all  y  in  [0,  x],  then  g'(y)  > 
h'(z  —  y)  for  y  in  [0,  z]  for  all  z  <  x. 

In  closing  this  discussion  of  the  functional  equation,  let  us  observe  that  if  an  interior 
maximum  exists,  we  must  have 

g'(y)  -  b'(x  -y)  +  (a-  b)f'[ay  +  b(x  -  y)]  =  0,  (4.58) 

and 


/'(x)  =  h>{x  -  y)  +  if  [ay  +  b(x  -  y)]  .  (4.59) 

These  equations  may  be  solved  explicitly  for  y  and  /(x)  if  g  and  b  are  quadratic. 
This  particular  solution  also  furnishes  a  useful  approximation  to  the  solution  of  the 
general  case. 


4.6.  Thu  Equation  fix)  =  Max  [gly)  +  hlx  —  y)  +  f*  fly  —  s)k(s)  ds] 

o<y<« 

As  another  application  of  the  techniques  we  have  developed,  let  us  now  consider  the 
functional  equation 


fix) 

where  we  shall  assume 


—  Max 

o  <»<•  L 


g(y )  +  b(x  -  y)  +  |  /(y  -  r)*(.f) 


s: 


ds  , 


(4.60) 


(a) 

gi  o) 

= 

Ho) 

=  o, 

(b) 

g'iy') 

> 

o. 

b'iy)  >  o,  g'i o)  <  b‘ 

(c) 

Hs) 

> 

0, 

id) 

g"iy') 

> 

0, 

b"iy)  >  o. 

(*) 

Hy) 

— 

giy) 

is  monotone  increasing  in  y. 

We  shall  use  the  successive  approximations  defined  by 


/o(x)  =  Max  [*(y)  +  b(x  -  y)], 

0<¥<* 

fn*i(x)  =  Max  [" g(y)  +  b(x  -  y)  +  f*  /„(y  -  r)*(r)  ds~\. 

o<y<»  L  Jo  J 


(4.61) 


(4.62) 
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Let  us  consider  /0(x).  For  x  small  the  maximum  is  attained  at  y  =  0,  since  g'(0) 
<  h'( 0).  Furthermore,  we  see  that  g(y)  +  h(x  —  y)  is  monotone  decreasing  for  small 
x,  as  a  function  of  y.  Since  g'(x)  surpasses  b'(0)  for  x  large,  g(y)  +  b(x  —  y)  will 
eventually  possess  a  turning  point  that  is  a  minimum  (see  Fig.  4.1).  The  maximum  will 


stay  at  y  =  0  until  the  point  x„  where  g(x)  =  b(x),  at  which  point  y  —  0  and  y  —  x, 
yield  the  same  value  for  g(y)  -f  b(x  —  >).  There  is  only  one  turning  point,  since  g'(y) 
—  h’(x  —  y)  =  0  can  have  only  one  solution  for  0  <  y  <  x.  This  is  a  consequence  of 
our  assumption  that  g"(y)  >  0,  h"(y)  >  0,  which  means  that  g'(y)  is  monotone  in¬ 
creasing,  whereas  b'(x  —  y)  is  monotone  decreasing.  It  follows  that 

jo  =  h(x)  ,  0  <  x  <  x, 

=  g(x)>  xt  <x 

=  Max  [g(x),£(x)].  (4.63) 

Consider  the  function  We  have 

/,  =  Max  fg(y)  +  h(,x  —  y)  +  f  /„(}’  —  s)k(s )  ds\.  (4.64) 

\\<.y<x  L  u  J 

The  function  *2(y)  =  g(y )  +  b(x  —  y)  +  s:  fo(y  —  s)k{s)  ds  is  monotone  decreas¬ 
ing  for  small  y  and  possesses  turning  points  for  the  y  values  satisfying 

g'{y)  +  J  f»(y  ~  -f)^(-f)  ds  -  h'(x  -  y).  (4.65) 

Since  f0(x )  is  again  a  convex  function,  we  see  that  the  left-hand  side  of  (4.65)  is  mono¬ 
tone  increasing  for  0  <  y  <  x,  whereas  the  right-hand  side  is  monotone  decreasing. 
Hence,  there  is  again  one  solution  at  most.  Let  x.,  be  the  first  value  of  x  for  which 

b(x)  =  g(x)  +  f0(x  —  s)k(s)  ds.  (4.66) 

Since  /„  >  0,  k(s)  >  0,  we  have  x2  <  x,. 

Furthermore,  since  g(x)  —  h(x)  4-  J'  f0(x  —  s)k(s)  ds  is  monotone  increasing, 
the  solution  takes  the  form 


I 


n 


OPTIMAL  ALLOCATION 


/i(*)  =  Max  £/>(*),  g(x)  +  jT  /0(x  -  /)i(j)  <*rj 
=  A(x)  for  0  <  x  <  x2 
=  ^(*)  +  X  fo(x  -  s)k(s)  ds  for  x  >  x2 . 


(4.67) 


From  this  it  is  clear  that  /,  (x)  is  again  convex.  In  exactly  the  same  fashion  we  obtain 
/»  =  0  <  x  <  x„ 


=  £(*)  + 


X 


s)k(s)ds,  x  >  x„ . 


(4.68) 


Since  fx  >  /0,  we  obtain  /n+1  >  /„  and  xn+,  <  xB  <  •  •  •  x„.  The  numbers  x„  are  mono¬ 
tone  increasing  and  approach  a  limit  x.  Since  /„  converges  to  /(x),  the  solution  of  (4.60), 
we  obtain 


/  —  A(x) ,  0  <  x  <  x 


=  + 


jr**- 


r)jt(r)  ds,  x  >  x. 


(4.69) 


This  proves  that  x  does  not  equal  zero,  since  /(x)  =  A(x)  for  a  small  positive  interval 
about  O,  as  we  see  on  comparing 


g(x)  +  Jg  /(■*  -  J)i(r)  ds  =  g(x)  +  0(xJ) 
with  A(x)  for  x  small. 

The  number  x  is  determined  as  the  non-zero  root  of 


Hx)  =  g(x)  +  I  A(x  -  J)i(r)  ds. 


(4.70) 


(4.71) 


4.7.  The  Optimal  Inventory  Equation 

Consider  the  functional  equation 


#(x)  =  Min  j^g(y  —  x)  +  j  [M  +  »(0)]e-6*' 

+  b  J'  tr*vt/(y  —  v)<fv|J, 


where  we  assume 


(a)  g(0)  =  0,  g'(y)  >  0,  g"(y)  >  0, 

(b)  *,Af>0,  0  <  4  <  1 . 

We  shall  approximate  to  u  by  means  of  the  sequence 


(4.72) 


(4.73) 


66 


THE  THEORY  OF  DYNAMIC  PROGRAMMING 


*o(*)  =  +  »o(0)]*-1*  +  b  e-bvu0(x  —  v)  dv , 

*.«(*)  =  Min  £g(y  —  x)  +  a  j  [M  +  u„( 0)]e-b« 

+  b  J'  e-bvun(y  —  i>)  (4.74) 

The  function  u0(x)  obtained  by  setting  y  =  x  for  all  x  corresponds  to  a  policy  of 
never  ordering. 

Let  us  now  determine  some  of  the  important  properties  of  »„(*).  Using  (4.74)  and 
setting  x  =  0,  we  obtain 


...  aM 


Thus,  the  equation  for  u0  takes  the  form 


KoW  =  - +  b  I  e-bvu„(x  -  v)  dv, 

i  —  a  Jo 


(4.75) 


(4.76) 


which  is  a  simple  representative  of  a  renewal  equation  and  may  be  solved  explicitly.  For 
our  purposes,  however,  there  is  no  need  of  this,  since  the  properties  we  require  may  be 
obtained  directly  from  (4.76).  Since  the  solution  may  be  obtained  quite  easily,  we  note 
that  it  is  given  by 


°  -  *) 

Referring  to  (4.76),  we  have 


m  .  ,  ,  ,  ,  am 

=  b(i -  a)  6  ^  ~1)+ 


(4.77) 


«J(x)  =  —abMe~bx  +  ab  J'  e  bvu'0(x  —  v)  dv,  (4-78) 

which  shows  inductively  or  by  direct  solution  via  iteration  that  u'0(x)  <  0.  Furthermore, 
1/(0)  =  — abM.  Differentiating  again,  we  obtain 

=  ab2Me~bz  +  abe-bxu'0( 0)  +  ab  j"  e~bvu”(x  —  v)  dv 

=  ab2M(l  —  a)e-bx  4-  ab  J"  e-hvu'0’(x  —  v )  dv,  (4.79) 


which  shows  that  «£'(x)  >  0,  again  directly  by  iteration  or  inductively. 
Consider  the  function 


*,(x)  =  Min  J^g(y  —  x)  +  <*|[M  +  «o(0)]*_‘1'  +  b  J'  e-bru„(y  —  v)  dv jj 


=  Min  [g(y  -  x)  +  w„(7)]  • 


(4.80) 
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//'(*)  =  -g'(yi  -  x)  +  —  x)  +  l/0(y !>] 

=  ~g\yx  ~  *) 

=  //J(y,)  (4.85) 

on  referring  to  (4.81).  Since  y,  >  x,  and  — v'0(x)  is  increasing,  we  obtain 

-//',(*)  =  -«£(>.)  <  ~<(x).  (4.86) 


From  (4.85)  we  obtain  u\\x)  =  u”(y, )/,.  Since  «i'(y,)  >  0,  the  sign  of  u\'  depends 
on  that  of  From  (4.81)  we  obtain  g"(y,  —  x-) (^J  —  1)  =  — //„ '(y)yt,  which  shows 
that  0  <  y\  <  1.  Hence,  //''  >  0. 


Consider  now  the  equation  for  u.l{x)\ 

//2(x)  =  Min  G(j  —  x)  +  a\  [Af  +  ux(0i)~\e-hv 

»>*  L  I 

+  b  J'  e-bvul(y  —  v~) 


(4.87) 


Taking  the  partial  derivative  of  the  expression  within  the  brackets  with  respect  to  y, 
we  obtain 


MS 

dy 


X ebv,/' 


—  g'(y  ~  x)  +  ab  I  e~bvn\(y  —  v)  dv  —  abMe-bv .  (4.88) 


Setting  this  equal  to  zero,  we  obtain 

g'(y  —  x)  =  abMe~bv  —  ab  J"  e~bvu\(y  —  v)  dv . 
Let  us  consider  the  function 


(4.89) 


aj^Mbe~bv  —  b  J"  e~bvu\(y  —  v )  dv  J  =  02(y).  (4.90) 


We  have 


<£'(y)  —  —ab2Me~bv  —  abe~bvu\(p)  —  ab  J  e-nvu\\y  —  f)  dv 
=  e~buab [  —  bM  —  i/,( 0)]  —  ab  ^  e~bu”{y  —  v )  dv . 


(4.91) 


Since  —  u\( 0)  <  —  «'(0)  =  abM,  the  quantity  --bM.  —  «'(0)  is  negative.  Hence, 
<b’i{y)  <  0.  Thus,  there  is  one  solution  at  most  of  the  equation  in  (4.89). 

Since  —  »^(x)  <  — //£(x)  for  all  x,  the  curve  <£2(y)  lies  below  the  curve  4> i(y)  = 
abMe~bv  —  ab  e-bvu'u(y  —  v)  dv  for  all  y.  Therefore,  the  intersection  of  </>2  with 
g\y  —  x)  always  lies  to  the  left  of  the  intersection  of  g'(y  —  x)  with  <f>i(y),  as  shown 
in  Fig.  4.3. 
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y  =  x 


n 


R*.  4.3 


Let  Xj  be  the  solutioa  of  g'( 0)  =  <£2(x).  Then  x2  <  x,.  Therefore, 

u2  —  a|e-**[Al  +  »0(0)]  +  b  »,(x  —  v)e~hv  dv  ,  x  > 

=  tfO'z  ~  *)  +  ^(>2.  *)  .  0  <  X  <  X2  . 

From  this  we  conclude  that 


(4.92) 


«2(x)  =  —abMe-b*  4- 


v)e-ivdv,  x  >  x2 


=  “Y0'*  —  *)>  0  <  x  <  x2.  (4.93) 

Comparing  the  expressions  for  u[  and  u2  for  x  >  x2  and  for  0  <  x  <  x2,  we  readily  con¬ 
clude  that  —  »J(x)  <  —  wj(x).  For  x2  <  x  <  x2,  we  note  that 


g'(O)  >  abMe-ht  —  ab  ^  u\ (x  —  t')e-*"’  </v. 


(4.94) 


Since  g/(yl  —  x)  >  g'(O),  we  see  that  — «^(x)  >  u'2{x’)  for  all  x. 

Let  us  now  examine  the  convexity  of  «2(x).  The  difficult  region  is  0  <  x  <  x2.  Here 
we  have,  using  (4.93), 

<  =  -g"(,y*  ~  *)(/*  -  !)•  (4.95) 

We  see  that  the  sign  of  t/2  depends  on  that  of  yj  —  1.  Referring  to  (4.89),  the  equation 
which  defines  y2,  we  have,  differentiating  with  respect  to  x,  with  y  =  y2, 

g"(y  —  x)(/  —  1)  —  /£  —  ab2Mer’by  —  abe-bvu[(  0) 

~al>  Jo  (r*VU'l'(y  ~  •  (4.96) 


*6 


t 

■t 


n 
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Since  u\’  >  0  and  -u\(Q)  <  —  u'(0)  =  abM,  we  see  that  the  coefficient  of  /  is  nega¬ 
tive.  Since  g"  >  0,  this  shows  that  /  >  0.  Referring  to  (4.96),  we  see  that  this  implies 
that  /  —  l  <  o.  Hence,  u”  >  0. 

We  now  have  all  the  results  required  for  an  inductive  proof.  The  expression  for  un  is 

“n  =  zg(yn  —  ■*•)+  4  //„-,(0)] 

+  bf  —  t’)e~hv  dv^  (4.97) 

for  0  <  x  <  x„,  and 

=  a^e-tx[M  +  «B.,(0)]  -f  b  J  //„_,(x  —  v)e~bv  dv  |  (4.98) 

for  x  >  xn,  where 


0  <*»  •  <  *2  <  *i , 

0  <  •  •  •  <  72  <  • 


(4.99) 


Using  the  monotone  properties  and  letting  n  — >  oo ,  we  obtain  for  »(x)  a  representation 
u  =  g[7(x)  —  x]  4-  rt|e-6"<r)[Al  +  «(0)] 

XV(x)  ■< 

u[y(x )  -  v]e-bvdvj  (4.100) 

for  0  <  x  <  x„ ,  and 

h  —  4-  »(0)]  4  J'  »(x  —  v)e~bv  dv  , 

for  x  >  xx. 

We  now  wish  to  show  that  x„  =/=  0.  For  small  x  and  y  we  have 


«(x)  =  Min  [,(«)(,  —  x)  -I-  a |  [M  4-  »( 0)]*?-1 


=  Min  |Y(0)(j  —  x)  +  —  abMy  +  0(j2)]  . 

This  shows  that  for  small  x  the  minimum  is  not  at  y  =  x. 


6k 

*’l>(0)  +  0(7-  v)]jj 

(4.101) 
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4.8.  The  Solution  for  Linear  Cost 

Let  us  consider  the  simple  case  in  which  g(y )  =  cy,  where  c  is  a  positive  constant. 
The  equation  is  now 

#(x)  =  Min  c(y  — -  x)  +  a\  [yVf  +  u(0')~]e-bv 
v>x  L  l 

+  b  J'  e~bvu(y  —  v)  dv  }]•  (4.102) 


For  0  <  x  <  xx,  we  choose  y  —  y(x)  where 


=  abMe~bv  +  ab  J'  e  bvu\y  —  v)  dv . 


(4.103) 


Since  the  equation  is  independent  of  x,  the  solution,  which  we  know  to  be  unique,  is 

y  xx. 

The  optimal  policy  is  then  to  choose  y  =  xx  if  x  <  xx  and  to  choose  y  —  x  if  x  >  x„. 
The  function  u(x)  satisfies  the  equation 

u(x)  =  a  [ M  4-  *( 0)~\e~bv  +  b  J"  e~bvu(jx  —  v~)  dv^,  x  >  xm 

=  c(x„  —  x)  +  a  [ M  +  u( 0)]e-6*«  +  b  J"  e~bvu(xx  —  v)  dv  j  , 

0<x<xx.  (4.104) 


For  0  <  x  <  xx,  we  have  u’ (x)  =  —c,  whence  *(x)  =  »(0)  —  cx  in  [0,  xK].  Since 
xx  is  determined  by  (4.103),  we  may  use  the  second  equation  in  (4.104)  to  find  #( 0). 
Having  determined  the  solution  in  [0,  x„],  the  solution  for  larger  x  is  found  by  solving 
the  first  equation  in  (4.104),  a  simple  renewal  equation. 


CHAPTER  5 


THE  EQUATION  u(n) 


=  Max  /  aifu(n  —  p  +  e, 


AND  RELATED  TOPICS 


5.1.  Introduction 


In  this  chapter  we  consider  a  number  of  functional  equations  that  are  more  or  less 
loosely  connected.  The  first  equation  is 


«(«)  =  Max 

l<Klf 


auu(n  -  j)  +  Ci 


the  homogeneous  form  of  which  was  encountered  in  Section  1.6,  Problem  1.11.  After  dis¬ 
cussing  tlie  asymptotic  behavior  of  the  solutions  of  (5.1)  for  the  case  in  which  the  aif  are 
all  non-negative,  we  shall  discuss  a  problem  in  production  planning  that  gives  rise  to 
the  functional  equation 

/*(*)  =  Max  /w-i(Bx)]  ,  N  =  1,  2,  •  •  • ,  (5.2) 

where  x  is  a  two-dimensional  vector,  A  and  B  are  2X2  positive  matrices,  and  /0(x)  = 
c1xl  +  caXj.  This  problem  seems  extremely  difficult,  and  we  are  able  only  to  contribute 
some  partial  results,  which  are,  however,  of  interest  in  themselves. 

We  shall  close  with  a  solution  of  the  simple  testing  equation 


fl 

=  Min 


+  */(!)' 
+  /(<**)_ 


x>0. 


/(0)  =  0. 


5.2.  The  Equation  u(n)  =  Max  [2?-i  —  /)] 

1  <i<K 

We  shall  begin  our  discussion  with  the  homogeneous  equation 


-  R 

//(«)  =  Max  ^ 


<**>«(«  —  /)  ,  »  >  R, 


where  »(/)  is  a  given  non-negative  quantity  for  0  <  /  <  R  —  1. 
Our  first  result  is 

Theorem  5.1.  Consider  equation  (5.4),  in  which  we  assume  that 

(a)  <*«/  >  °; 
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0 


f> 


v 


(b) 

there  is  one  equation,  rR  =  2f-i  akir't~l>  whose  largest  posi¬ 
tive  root  is  greater  than  the  corresponding  roots  of  the  other 
equations  of  this  type; 

(c) 

for  this  index  k,  akl=/zO. 

(5.5) 

Under  these  circumstances ,  the  solution  of  (5.4)  is  given  by 

R 

*(»)  =  ^  **>»(«  -  /),  (5.6) 


for  n  sufficiently  large. 

Proof.  For  the  sake  of  simplicity,  consider  the  third-order  case  with  k 


u{n  4-  3)  =  Max 


A:  a,u(n  +  2)  +  bxu(rt  4-1)4-  c,»(«) 
B:  a2u(n  +  2)  +  b2u(n  +  X)  +  c2«(») 


2: 


(5-7) 


where  u(0),  u(  1),  u( 2)  are  preassigned  positive  quantities.  Let  us  assume  that  of  the 
two  equations 


r3  =  axr2  +  bxr  +  cx, 

r3  =  a2r 2  4-  b2r  +  c2,  (5-8) 


it  is  the  first  that  has  the  largest  positive  root,  and  let  p  be  this  root. 

Let  us  first  show  inductively  that 

epn  <  »(«)  <  fpn  (5.9) 

for  two  positive  constants  e  and  /.  Consider  the  lower  inequality  first.  Let  e  be  chosen  so 
that  the  inequality  is  valid  for  n  =  0,  1,  2.  Then,  since 

«(»  +  3)  >  axu(n  +  2)  -f  bxu(n  +  1)  4-  r,»(w),  (5.10) 

we  obtain 


«(3)  >  e(alP2  ■]-  blP  +  c)  =  eP\  (5.11) 

and  clearly  an  inductive  argument  yields  the  inequality  for  all  n. 

To  obtain  the  upper  inequality,  we  proceed  similarly.  The  constant  /  may  be  chosen 
so  that  the  upper  inequality  is  valid  for  n  —  0,  1,  2.  Then 


»( 3)  <  Max 


~f(<i\P2  +  b2p  4  r,)  =  fp3~ 
J(arP2  +  b2p  4-  c2)  <  /p3_  ’ 


(5.12) 


where  the  last  inequality  is  a  consequence  of  the  maximal  property  of  p.  It  is  again  clear 
that  an  inductive  argument  yields  the  upper  inequality. 

To  prove  Theorem  5.1,  we  show  that  the  assumption  that  B  is  employed  infinitely 
often  leads  to  an  eventual  contradiction  of  the  lower  inequality.  If  B  is  used  for  m  — 
n  +  3,  n  >  0,  we  have 


«(«  +  3)  =  a2u(n  4-  2)  +'  b2u(n  4-1)4-  c2u{n) 
<  fpH[<*rP2  +  b2p  4-  c ,]  . 


(5.13) 
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Using  the  maximal  property  of  p,  we  have  a2p*  +  b2p  +  c2  <  r3ps,  where  0  <  cs  <  1. 
Hence,  we  obtain  u(n  +  3)  <  c3/pn+3. 

Now  consider  u(n  4-  4).  We  obtain 

*(*  +  4)  =  MaxP’"(*  +  3>  +  ^  +  2>  +  +  '>1 

\_a2u(n  3)  +  b2u(n  +  2)  +  c2u(n  +  1)_| 

<  MaxPpB+1^lf3pa  +  blP  +  fl^l 

Upn+'(d2cap2  +  b2p  +  c2)  J 

<  fc<P n+4,  (5.14) 

where  0  <  ct  <  1,  and  the  constant  /c4  is  again  independent  of  n.  Observe  that  the 
condition  ax  0  is  essential  for  our  proof.  In  exactly  the  same  fashion  we  find  that 
u(n  +  5)  <  /r6p"+5,  0  <  cs  <  1.  Having  established  the  relation  u(m)  <  r6/pm  for 

m  =  rt  4-  3,  n  +  4,  n  +  5,  three  consecutive  values  of  m,  where  cs  =  Max  (cs,  c4,  c„), 

it  follows  from  the  recurrence  relation  (5.4),  that  this  inequality  is  valid  for  all  sub¬ 
sequent  m. 

We  see  then  that  the  effect  of  employing  B  once  is  to  reduce  the  constant  /.  It  fol¬ 
lows  that  a  choice  of  B  infinitely  often  will  eventually  lead  to  a  contradiction  of  the  lower 
bound  u(n)  >  epn.  Consequently,  there  is  a  number  «0  dependent  on  the  coefficients 
and  initial  values  such  that  for  n  >  B  is  not  employed.  The  proof  given  above  enables 
one  to  obtain  an  upper  bound  for  the  number  of  times  that  B  is  employed.  Combining 
this  fact  with  the  easily  demonstrated  fact  that  a  choice  of  A  for  any  three  consecutive 
values  of  n  implies  that  it  is  chosen  for  all  larger  values  of  n,  we  may  obtain  a  number 
nn  with  the  property  that  for  n  >  n0,  A  is  always  used. 

The  condition  that  ax  0  is  necessary  for  the  truth  of  the  result  in  general.  It  is  not 
difficult  to  verify  that  if 

r A:  bu(n)  “I 

+  2)=Max|_z?;  cu(„+1)  +  eu(n)j> 

«(0)  =  1, 

»(l)=c  +  e,  (5.15) 

where  c2  <  b  <  c  and  €  is  sufficiently  small  and  positive,  then  the  optimal  pattern  is  B 
for  odd  «,  A  for  even  n. 

A  finer  analysis  will  show  in  the  general  case  that  the  optimal  pattern  is  always 
eventually  periodic. 

The  case  in  which  at  least  two  characteristic  equations  have  the  same  maximum  root 
is  more  difficult  to  hr.ndle.  It  is  easily  seen  that  for  large  n  only  those  choices  correspond¬ 
ing  to  largest  roots  will  be  used,  and  it  is  not  difficult  to  show  by  consideration  of 

the  quantities 

<£(»)  =  Min  [^(w),  V(n  +  1),  V(n  +  2)] 

*(»)  =  Max  [K(«),  V{n  +  1),  V(»  +  2)] ,  (5.16) 

where  V(n)  —  *(»)p-“,  the  first  of  which  is  monotone  increasing  and  the  second  mono- 
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tone  decreasing,  that  u(ri)p~n  approaches  a  limit  as  n  — »  oo.  However,  any  further  in¬ 
formation  concerning  asymptotic  behavior  seems  to  be  difficult  to  obtain. 

5.3.  The  Inhomogeneous  Equation 

Let  us  now  consider  the  equation 


~  R 

»(»)  =  Max  V1  aifu(n  -  /)  +  g{  , 


(5.17) 


for  the  case  in  which  ait  >  0  and  gt  >  0,  where  again  the  quantities  «(/),  0  <  /  < 
J?  —  1  are  given  positive  constants.  The  most  interesting  case  is  that  in  which 


5>- 


for  /  =  1,  2,  •  •  • ,  M. 


Since  each  equation  has  largest  characteristic  root  equal  to  1,  it  is  the  forcing  term 
that  dominates  the  situation  for  large  n. 

From  the  theory  of  linear  difference  equations,  it  is  known  that  the  solution  of  any 
recurrence  relation  of  the  form 


IV 

*(»)  =  ^  4‘t"(w  -  /)  +  gi’  n  >R, 

«(/)  =  ci,  0  <1  <  R  -  1,  (5.18) 

where  2f=i  au  —  1>  ai i  ^  0  has  the  form 

*(»)  =  +  0(«J) ,  0  <  a.  <  1 ,  (5.19) 

2  ian 
1=1 

for  large  «,  where  dt  is  a  constant  dependent  on  the  initial  conditions. 

We  should  suspect,  then,  that  the  solution  of  (5.17)  would  be  determined,  for  large  n, 
by  the  index  /  for  which  g*/2;  ian  assumes  its  maximum.  This  is  indeed  true.  We  shall 
prove 

Theorem  5.2.  Let 


c  =  Max  -=— — 
i  2  Jaif 
y=i 


(5.20) 


be  attained  for  the  single  value  i  =  s.  If  atl  >  0,  the  solution  of  (5.17)  is  given  by 


A 

»(»)  =  ^  a’<U(n  ~  1)  + 


(5.21) 


for  n  >  »0,  where  n0  is  an  integer  dependent  on  the  initial  conditions  and  coefficients. 
Proof.  Let  us  establish  first  the  inequalities 
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nc  —  K  <  u(n)  <  nc  +  K 


(5.22) 


for  all  n,  with  a  suitable  choice  of  K.  For  0  <«</?,  we  may  choose  K  so  that  the  in¬ 
equality  is  valid.  Let  us  now  establish  it  inductively  for  n  >  R  +  1.  We  have 


n. 

»(«)  >  ^  ~  j)  +  g' 


i-i 

R 

>  +s* 

=  nc  -  K, 
using  the  value  for  c  given  in  (5.20). 

To  establish  the  upper  bound,  we  use  the  fact  that  if  the  rth  choice  is  made  at  n, 
we  have 


(5.23) 


IV 

"(»)  =  ~  ft  +  8i 

J=1 

R 

<  -  >>  + 


>= i 


<  nc  + 


K 

K  +  gi  —  iau 

j=l 


<  nc  +  K, 


(5.24) 


using  the  optimal  property  of  the  index  s. 

Following  the  same  reasoning  as  that  above,  let  us  show  that  if  any  other  choice  than 
the  rth  is  made  at  «,  the  upper  bound  will  be  decreased.  As  before,  this  will  show  that 
the  index  s  must  be  selected  for  all  large  n. 

Referring  to  (5.24),  we  see  that  if  i  s,  we  have 


»(»)  <  nc  +  K  +  gi  —  y  '  jau  <  nc  +  k  —  di,  (5.25) 

where  d,  >  0.  Consider  now  the  situation  at  n  +  1 .  We  obtain,  for  some  /, 

R 

u(n  +  1)  =  <ti)U(n  +  1  —  j)  +  gi 

j=* 

R 

<  ai2[nc  +  K  —  <f,]  +  y  ^  aif(l  +  n  —  j)c  +  K  +  gt 

i=3 

R 

<  (w  +  l)c  +  K  —  a^di  4 -  gi~  c  ^  jau 

<  («  +  l)c  +  K  -  d2,  (5.26) 
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where  d2  >  0,  since  if  /  =  x,  we  have  atl  >  0,  and  if  i  s9  we  have 

R 

gi  -  c  My  <  o. 

TSi 

Continuing  in  this  fashion,  we  see  that  for  «,  »+  1,  •■■,»  +  R  we  obtain  a  positive 
constant  such  that  /(«)  <  nc  +  K  —  d,  if  i  =£  s  is  selected  at  ».  Having  established 
this  upper  bound  for  R  consecutive  values,  it  follows  via  induction  that  the  inequality 
persists  for  all  larger  values.  We  observe  then  that  a  repeated  application  of  choices  dif¬ 
ferent  from  .r  cannot  yield  an  optimal  policy,  since  eventually  we  shall  obtain  a  contra¬ 
diction  to  the  lower  inequality. 

If  there  are  several  choices  yielding  the  same  c,  the  above  argument  shows  that  we 
may  restrict  ourselves  to  considering  only  these  choices.  The  asymptotic  behavior  will  be 
the  same  as  above,  namely,  u(n)  — •  nc,  and  the  result  of  varying  choices  will  be  negli¬ 
gible.  Nevertheless,  it  is  an  interesting  open  problem  to  determine  the  asymptotic  form 
of  the  solution  in  this  case. 

5.4.  A  Class  of  Problems  Arising  in  Production  Planning 

Let  us  consider  the  following  simplified  problem.  We  are  given  an  initial  stock  x  and 
y  of  two  quantities  A  and  B,  and  means  of  producing  more  of  A  and  B  using  the  initial 
amounts.  Specifically,  we  may  divide  x  into  two  parts,  ux  to  be  used  to  produce  more  A, 
and  u2  to  be  used  to  produce  more  B,  and  y  into  two  corresponding  parts,  vx  and  v2.  The 
new  amount  of  A  will  be  /(»,,  and  that  of  B  will  be  g(u2>  «’2),  where  f  and  g  ate 
given  functions.  This  operation  is  now  to  be  repeated  N  times,  and  the  general  problem 
is  that  of  maximizing  h(xs,  yv),  where  b  is  a  given  function. 

Problems  of  this  type  arise  in  planning  production  schedules  where  different  tech¬ 
niques  are  applicable  at  each  stage  of  production. 

Since  the  mathematical  problem  in  its  above  generality  seems  to  be  hopelessly  beyond 
our  reach,  let  us  consider  the  simpler  situation,  where 


/(»,  v)  —  a^u  -I-  a2 v, 
g(u,  v)  =  bxu  +  btv, 

y )  = 


(5.27) 

(5.28) 


and  all  the  coefficients  involved  are  non-negative. 

Another  criterion  function  of  interest,  which  applies  to  "bottleneck”  situations,  is 


k(x,  y)  =  Min  (x,y). 


(5.29) 


We  shall  not,  however,  discuss  any  of  these  very  interesting  and  important  problems  here. 
If  we  define 


W'oix,  y)  =  ctx  +  c2y  =  b(x,  y) , 

•P'nrC*.  y )  =  yx), 


(5.30) 
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we  obtain  for  W v  the  functional  equation 

(x,  y)  =  Max  [W'„.x(aix  +  a2y,  0),  l^V^O,  bxx  +  b2y), 

W'N.i(^x,  b2y ),  HV.^y,  *,*)]  (5.31) 

for  N  >  1,  since  the  linearity  of  A(x,  y)  will  force  to  be  either  0  or  x  and,  similarly, 
vt  to  be  either  0  or  y. 

We  see,  therefore,  that  we  are  given  four  matrices, 


(5.32) 


and  the  problem  is  that  of  forming  a  vector, 


where  each  Ct  is  an  Ajt  j  =  1,  2,  3,  4,  in  such  a  way  as  to  maximize  the  inner  product  of 
(xN,  yy)  with  a  given  vector. 

In  this  form  the  problem  may  readily  be  generalized.  However,  even  in  its  simplest 
forms  it  seems  extremely  difficult.  If  we  seek  to  determine  not  the  actual  optimal  policy, 
in  the  general  case,  but  merely  the  order  of  magnitude  of  W N(x,  y),  the  problem  is  still 
difficult.  In  the  succeeding  section  we  shall  present  a  preliminary  result  in  this  direction. 

5.5.  Th«  Problems  of  Largest  Characteristic  Root 

In  this  section  we  shall  present  a  preliminary  result  for  the  following  problem: 

Given  a  finite  set  {At}  of  non-negative  square  matrices,  determine  for  each 
N  the  matrix  CK  —  B2B2  •  •  ■  By,  where  each  Bt  is  an  Af,  which  possesses  the 
largest  characteristic  root. 

The  problem  has  not  been  resolved  even  in  the  simplest  non-trivial  case  of  2  X  2 
matrices,  where  the  set  consists  of  two  non-commuting  matrices  A  and  B. 

Let  us  introduce  the  notation 

<f>(A)  =  characteristic  root  of  A  of  largest  absolute  value .  (5.34) 

It  is  a  classical  result  of  Perron  that  <f>(A )  is  positive  if  A  is  non-negative,  unless  all  the 
characteristic  roots  of  A  are  zero.  To  simplify  the  presentation  we  shall  assume  that  the 
At  are  actually  positive.  We  now  prove 

Theorem  5.3.  A.  =  LimJf^a0^(C1,)1/'f  exists. 

Proof.  Since  C%  is  a  possible  candidate  for  it  follows  that 

*(CW)»  =  <KQ)  <  •  (5.35) 

lotting  N  run  through  the  values  {2*},  Jk  =  0,  1,  2,  •  •  • ,  we  see  that 

\(k)  =  <KCtk)*‘  (5.36) 

is  a  monotone-increasing  function  of  k. 
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To  show  that  it  is  bounded,  let  us  introduce  the  majorant,  M,  of  the  matrices  {Arf, 
defined  by  the  property  that  the  Mh  element  in  Af  is  the  maximum  of  the  set  of  hlth  ele¬ 
ments  occurring  in  the  {A{}.  We  shall  employ  the  notation 

A«B  (5.37) 

to  indicate  that  ait  <  bit  for  all  /'  and  j.  It  is  known  that  if  A  and  B  are  positive  matrices, 
then  B  >>  A  implies  <f>(B)  >  4>(A).  The  converse  is,  however,  not  true. 

Since  M  >>  Ait  we  have  M *  >>  Cv,  whence  <£(CV)  <  Con¬ 

sequently,  the  sequence  <j>(CN')1/N  is  uniformly  bounded  for  all  N,  and  thus  A(£)  is  uni¬ 
formly  bounded.  Since  k(h)  is  monotone  increasing,  I.imA(£)  exists  as  k— »  oo,  and 
we  set 


A  =  LimA(>£).  (5-38) 

k-+  oo 

It  remains  to  show  that  </>(Cv),/v  has  a  limit.  Let  k  be  a  fixed  large  number  and  write, 
for  N  >  2*,  N  =  2*q  -f  r,  where  0  <  r  <  2*  —  1,  q  and  r  being  integers. 

Since  ( C2k)qCr  is  a  possible  choice  for  CK,  it  follows  that  <t>(CN)  >  <}>(C2k)qCT.  Since 
Cr  >>  al,  where  I  is  the  identity  matrix,  for  some  a  >  0,  we  have  (C2k)2CT  >> 
a(C2k)q.  Hence, 

*(Cy)  >  a<KC2k)q,  (5.39) 

or 

*(CV)"*  >  (5.40) 

Letting  N  — »  oo ,  we  have 

Lim^C*)1'*  >  <j>(C2ky*‘.  (5.41) 

Since  this  holds  for  every  k,  we  obtain,  finally,  Lim  <f>(CNy,N  >  A. 

In  the  above  proof  we  have  used  positivity  only  in  the  statement  Cr  >  >  al.  A  finer 
analysis  based  on  the  asymptotic  form  of  (C2k')q  for  large  q  will  show  that  non-negativity 
is  sufficient. 

To  obtain  the  inequality  Lim  <j>(CNy/K  <  A,  we  write  2*  =  qNm  +  r,  where  {Nm}  is 
a  sequence  on  which  Lim  is  obtained.  Then  consider  the  matrix  CqN  Cr.  We  have,  as  be¬ 
fore,  <j>(C2k)  >  a<f>(CK  )n,  whence 

<Kc2k)*'  >  (5.42) 

Letting  k  — >  oo ,  we  obtain 

*  >  <KCsmYn,~  (5.43) 

and  thus  A  >  Lim.  Combining  the  two  inequalities  we  obtain  equality. 

In  very  much  the  same  fashion,  we  may  prove 

Theorem  5.4.  Let  Ms  denote  the  smallest  majorant  of  all  the  products  BtBt  —  By 
where  each  Bt  is  an  At.  Then 
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Lim  <t>(Mltys,r  =  ju..  (5.44) 

00 

It  is  immediate  that  fx  >  A,  and  it  is  conjectured  on  the  basis  of  no  evidence  pro  or  con 
that  /x  — 

5.6.  A  Testing  Problem 

In  this  section  we  shall  present  the  solution  to  Problem  1.3,  posed  in  Section  1.2. 

Let  /(x )  equal  the  expected  time  consumed  using  an  optimal  procedure.  Then 

(L:  1  +  xf  ( 1 )  j 

f(x)=Uin[A:  l+f(ax)\,  1>*>0,  0<*<1.  (5.45) 

For  x  close  to  zero  it  is  dear  that  f(x)  >  x/(  1),  since  /(x)  >  1  for  all  x.  Therefore, 
in  some  interval  [0,  x0]  we  have 

f(x)  =  1  +  x/(l),  (5.46) 

where  /( 1 )  is  some,  as  yet  undetermined,  constant. 

In  [x0,  x0/a\  we  obtain 

fl  +  xf  ( 1 ) 

/(*)  -  M,n  (j  +  j-j  +  4S^(1)]J-  (5.47) 

Hence,  we  must  compare  xf(  1)  with  1  4-  axf(l).  If  we  assiune  that  1  +  axf(  1)  <  x/(  1) 
for  x0  <  x  <  x„/a,  we  turn  to  the  next  interval  [x0/a,  x0/a2],  and  so  on.  Since,  even¬ 
tually,  x0/<i*  >  1  if  x0  0,  we  must  in  this  way  either  cover  the  interval  [0,  1]  or  obtain 
a  point  Xj  where  1  +  x/(  1)  >  1  +  f{ax).  This  certainly  is  true  at  x  =  1,  since 

fl  +  /(1)1 

/(l)  -  Min  +  =  1  +  f(a) .  (5.48) 

Let  us  show  that  A  is  used  to  the  right  of  of  x2,  where  x2  is  the  first  point  at  which 

**/(!)  =  /Ottj)-  (549) 

If  L  is  employed  for  x2/a  >  x  >  x2,  we  have 

/iW  =  1  +  x/(  1)  =  1  +  x[l  +  /(<*)].  (5.50) 

If  A  is  used,  we  have 

/aW  =  1  +  /(«)  =  1+  [1  +  «/(l)],  (5.51) 

since  <*x  <  x2,  which  means  that  L  is  used  there.  At  x  =  x2,  the  two  straight  lines 
y  =  1  4-  x[l  +  /(<*)]  ,y=2  +  axf(  1)  intersect.  Hence,  for  x  >  x2,  one  is  above  the 
other.  At  x  =  0,  one  intercept  is  1,  the  other  2;  hence, 

fAL  =  2  +  *x/(l)  <  1  +  x[l  +  /(l)]  =  fLA  (5.52) 

for  x2  <  x  <  x2/4.  Similarly,  we  show  that  fAL  <  fLA  in  xx/a  <  x  <  xx/a2,  and  so  on. 
Only  a  finite  number  of  such  steps  are  required. 
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It  remains  to  compute  /( 1)  and  xt.  Let  <»*  <  x,  <  4*-1.  Then 
/(l)  =  k  +  /(**)  =  k  +  1  +  <**/(l), 

whence 

Since  this  is  a  convex  function  of  k,  the  minimum  occurs  at  either  a  unique  k  or 
adjacent  k's.  Having  determined  k,  we  have  /(l),  and  then 

1  1  —  ** 


(1  -*)/(!)  (!-*)(*+!)• 


(5.53) 

(5.54) 
at  two 

(5.55) 


CHAPTER  6 

GAMES  OF  SURVIVAL 
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6.1.  Introduction 

In  this  chapter  we  shall  present  some  results  concerning  a  class  of  games,  which  we 
call  "games  of  survival,”  in  which  two  players  with  finite  fortunes,  /,  and  respec¬ 
tively,  in  chips  play  a  normalized  finite  zero-sum  two-person  game.  The  game  is  continued 
until  the  fortune  of  one  of  the  players  is  reduced  to  zero,  or  ad  infinitum  if  this  never 
occurs.  The  payoff  in  money  is  (1,0)  if  player  two  is  ruined  before  player  one,  and 
(0,  1)  if  the  reverse  holds. 

Another  way  of  viewing  this  is  that  each  player  is  playing  so  as  to  maximize  the  prob¬ 
ability  that  he  will  survive  his  opponent. 

We  shall  first  consider  a  simple  game  using  the  functional-equation  approach  of  the 
previous  chapters,  and  then  present  a  more  powerful  technique  that  utilizes  more  of  the 
actual  structure  of  the  process. 


6.2.  The  2X2  Game 

Let  us  consider  the  situation  in  which  two  players,  A  and  B,  possessing  fortunes  x 
and  j,  respectively,  play  the  zero-sum  game  defined  by  the  matrix 


r  = 


(6.1) 


where  a,  b,  and  c  are  positive  integers,  with  the  purpose  in  mind  of  ruining  the  opponent. 

Since  the  game  is  zero-sum,  we  shall  set  x  4-  y  —  d  and  specify  the  state  of  the  for¬ 
tunes  of  the  players  by  x,  the  quantity  held  by  A.  Let  us  define,  for  0  <  x  <  d,  x 
integral. 


f(x)  —  probability  that  B  is  ruined  before  A  when  A  has  x  and  both  players 

use  optimal  play,  (6.2) 


setting 

/(x)  =  0,  x  <  0 

=  1,  x  >  d. 

If  this  function  exists,  it  satisfies  the  equation 


(6-3) 


/(x)  =  Min  Max  [/>1^,/(x  —  1)  +  /^/(x  +  a) 

+  Mi/(*  +  O  +  PtftKx  -  *)] 

—  Max  Min  [•••], 

P  Q 
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for  x  =  1,  2,  1 . 

To  simplify  the  formulas  which  occur,  we  shall  set  F[/(x)]  as  the  value  of  the  game 
whose  matrix  is 


//<*-  i)  /(*  +  “A 
V/(*'+0  f(x-b)J 


(6.5) 


We  shall  use  the  notation  F(Af)  to  denote  the  value  of  the  game  whose  matrix  is  M. 
The  functional  equation  of  (6.4)  therefore  has  the  form 


f(x)  =  F[/(x)],  x=  1,2,  1, 

fix)  =0,  x  <  0 

=  1 ,  x  >  d .  (6.6) 

Although  it  is  not  immediately  seen  that  fix)  exists,  there  is  no  difficulty  in  defining 

/„(x)  —  probability  that  B  is  ruined  before  A  when  n  rounds  of  the  game  are 

played  with  both  sides  using  optimal  play  and  A  possessing  x.  (6.7) 

This  function  satisfies  the  equations 

foix)  -  I,  x>d 

=  0,  x  <d  —  1, 

/»i(*)  =  F[/,M],  n  =  0,  1,  •••,  X  =  1,  2,  •  ■  •  ,d  —  1, 

/»+i(*)  =  1,  x  >  d 

=  0 ,  x  <  0,  (6.8) 

assuming  that  in  the  w-stage  process  A  plays  to  maximize  this  probability,  and  B  plays  to 
minimize  it.  The  situation  is  unsymmetrical,  since  there  is  always  in  the  «-stage  process  a 
non-zero  probability  that  both  sides  survive.  As  n  — »  oo ,  this  probability  approaches  zero, 
and  the  situation  becomes  symmetrical. 

It  is  clear  that  /,(x)  >  /„(x)  for  all  x,  and  hence,  inductively,  that  /„+ a(x)  >  /n(x). 
It  follows  from  the  trivial  observation  that  0  <  /„(x)  <  1  for  all  x  and  n  and  that  /n(x) 
converges,  as  n  —>  oo,  for  all  x  to  a  function  that  we  call  fix).  That  /(x)  satisfies  (6.6)  is 
a  consequence  of  the  fact  that  the  value  of  a  game  is  a  continuous  function  of  the 
game  matrix. 

Since  foix),  and  consequently  each  /„(x),  is  a  monotone-increasing  function  of  x,  it 
follows  that  fix)  is  monotone.  Let  us  now  demonstrate  the  important  result  that  it  is 
actually  strictly  monotone. 

We  have 


'”)•  (&9> 

If  fid)  and  /(c)  are  positive,  then  /(l)  >  0.  Let  us  assume,  to  the  contrary,  that  /(x) 
=  0, 1,  2,  -  -  - ,  k  <  d,  but  fik  +  1)^0.  That  a  k  with  this  property  exists  is  clear 
Then 
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m  = 


//(*  - 1) 

fik  4-  a)\  ( 

W  +  c) 

fik  -  b)J  ~ 

/(*  4-  c) 


/( *  +  <0 
0 


) 


(6.10) 


Since  f(k  +  a)  >  f(k  +  b)  >  0,  f(k  4-  c)  >  fik  +  1)  >  0,  it  follows  that  /(*)  >  0, 
which  is  a  contradiction,  unless  k  =  0.  Thus,  /( 1)  >  0. 

Now, 


/(  2) 


=  /  /(l)  /(*+2)\ 

\Kc+2)  f{2-b))- 


(6.11) 


Since  /( 1)  >  0,  /(«  4-  2)  >  f(a  +  1),  /(r  +  2)  >  /(c  +  1),  /( 2  -  £)  >  0,  we  must 
have  /( 2)  >  /(l),  unless  /(2  —  b)  —  0  and  the  solution  of  the  game  is  p2  =  q2  =  1. 
This  is  clearly  not  so,  since  =  1  is  a  better  response  to  p2  =  1.  Similarly,  we  prove, 
using  induction,  that 


0  =  /(0)  <  /(l)  <  /( 2)  <  <  /(d)  =  1 ,  (6.12) 

with  strict  inequality  at  every  step. 

With  these  preliminaries  disposed  of,  we  now  turn  to  the  question  of  uniqueness.  Let 
us  set 

T(P,  q,  f)  =  M./(*  -1)+  p1q2f(x  +  a)  +  p2qjix  +  c)  4-  p2q2fix  -  b) .  (6.13) 

Let  /  and  g  be  solutions  of 

f{x)  =  Min  Max  Tip,  q,  f)  =  Max  Min  Tip ,  q,  f)  , 
q  r  v  Q 

gix)  =  Min  Max  T ip,  q,  f)  —  Max  Min  T ip,  q,  f)  ,  (6. 14) 

IP  V  9 

satisfying  the  boundary  conditions 

/(*)  =  g(*)  =0.  *  <  0 

=  1,  x>d,  (6.15) 

with  the  further  assumption  that  gix)  is  uniformly  bounded. 

Under  the  assumption  that  fix)  gix),  let 

A  =  Max  |  fix)  -  gix)  |,  (6.16) 

and  let  y  be  the  largest  integer  in  [0,  d]  for  which  this  maximum,  assumed  to  be  not 

equal  to  zero,  is  attained. 

If  we  set  pi  =  piiy),  q\  =  ?<(?),  pi  =  ptiy),  =  qdy)  to  be  sets  of  values  for 
which  the  min-max  is  assumed,  we  have 

f(y)=Tip,q,f), 
gO)  =  Tip,q,g). 

From  the  properties  of  min-max,  we  have 


(6.17) 
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(a) 

/O')  =Tip,q,f)>Tip,q,f ) 

(b) 

<T(p,q,f ), 

(6.18) 

(a) 

g(y)  =  T(p,  q,  g)  <  T(p,  q,  g) 

(b) 

>  T(P>  4>g)- 

(6.19) 

Combining  (6.18a)  with  (6.19a),  we  obtain 


f(y)  -  g(y)  >  np.  q,  f)  -  T(p,  q,  g )  -  Tip,  q.f-g )  ,  (6.20) 

while  (6.18b)  and  (6.19b)  yield 

/O')  -  g(y)  <  Tip,  q,  /)  -  Tip,  q,  g)  =  Tip,  q,  f  —  g)  •  (6.21) 

From  these  two  inequalities,  we  conclude  that 

A  =  |  /O')  ~  j?0')|  <  Max  [|  T(p,q,f-  g )  |,  |  Tip,  q.f  ~  g)  j].  (6.22) 

Since 

I  T(P<  /  -  g)  I  <  T(p: ,  ?,  A)  =  A, 

I  I*(M.  /  -  f)  I  <  ?*(/>,  7  A)  =  A,  (6.23) 

we  conclude  that  (6.22)  is  actually  an  equality,  which  means  that  the  inequalities  in 
(6.20)  and  (6.21)  must  also  be  equalities. 

Consider  the  relation 


/(A)  -  g(y)  =  +  an)  -  giy  +  (6.24) 

•<< 

where  we  set 

a.n  ari)  \  c 

Since  2*u  Ptfi  =  1.  if  |  fiy  +  <**/)  —  qiy  +  Oil)  |  <  D,  piq,  must  be  zero.  By  as¬ 
sumption,  y  was  the  largest  integer  in  [0,  d]  for  which  |  /(x)  —  g(x)  |  =  A.  Hence, 
ptq{  =  0  whenever  aif  >  0. 

It  follows  that  p,q2  =  0,  p2q,  —  0.  Since  p,  +  p2  —  1,  both  pt  and  p2  cannot  be 
zero,  which  means  that  q ,  or  q2  =  0.  Coming  back  to  the  game  matrix 


.;)• 


(6.25) 


7(*  -  i)  H.x  +  <0\ 

Ji*  +  0  f(x  -  *)/ 


(6.26) 


we  see  that  the  strict  monotonicity  of  /(x)  makes  it  impossible  for  q,  =  0  or  q2  —  0  to 
be  optimal  play  for  13  for  x  =  y. 

We  have  thus  obtained  the  desired  contradiction. 

The  method  we  have  employed  is  quite  general  and  can  be  used  to  treat  many  particu- 
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lar  types  of  m  X  n  games.  The  general  case,  however,  in  which  one  only  assumes  that 
the  entries  in  the  matrix  are  positive  or  negative  integers  still  presents  difficulties. 

6.3.  Mors  General  Results 

Let  us  now  consider  the  game  n(/„  f2)  characterized  by  the  payoff  matrix,  (T1#),  in 
which  the  elements  are  non-zero  integers,  and  the  finite  fortunes  ft  and  /2  of  each 
player.  We  shall  show  that  Q  is  inessential  and  has  some  easily  described  optimal  strate¬ 
gies.*  We  shall  also  show  that  if  Maxi;  1 |  is  small  enough  compared  with  the  com¬ 
bined  fortunes,  then  to  play  at  the  «th  play  a  8n-optimaI  strategy  for  F  is  an  e-optimal 
strategy  for  11,  if  8  is  sufficiently  small.  (8"  is  the  nth  power  of  8.) 

We  assume  that  every  column  of  r  has  a  positive  entry  and  that  every  row  has  a  nega¬ 
tive  entry.  Otherwise,  there  would  be  a  negative  column  or  a  positive  row.  In  the  first 
case,  player  2  can  always  force  player  1’s  fortune  to  become  non-positive  by  repeatedly 
playing  the  negative  column.  In  the  second  case,  player  1  can  force  player  2’s  fortune  to 
become  non-positive  by  repeatedly  playing  the  positive  row. 

Let  «<»'(/„  /2)  be  the  game  in  which  two  players  repeat  T  n  times,  or  until  one  of  the 
players  has  a  non-positive  fortune,  if  this  occurs  first.  The  payoff  in  money  is  (0,  1)  if 
player  1  ends  with  a  non-positive  fortune,  and  (1,  0)  otherwise.  12<n)(/,,  /2)  is  a  constant- 
sum  two-person  game  with  value,  say,  [?<">(/„  /2),  1  —  vin)  (/,,  /2)].  We  observe  that 

1.  Player  2  can  always  win  as  much  money  in  r<n+1>  as  in  r(n)  by  playing  a  r<">- 
optimal  strategy  during  the  first  n  moves  of  r<"+l>  and  by  playing  arbitrarily 
on  the  («  +  l)th  move.  Hence, 

*<n)(/i,/a) 

2.  Since  each  column  has  a  positive  entry,  by  repeatedly  playing  the  strategy  that 

assigns  each  pure  strategy  probability  1  //0,  player  1  ensures  that  no  matter  what 
player  2  does,  player  2's  fortune  will  decrease  each  time  with  probability  at  least 
1//0-  Player  1  thereby  ensures,  with  a  probability  of  at  least  that  player 

2  will  be  bankrupted  in  at  most  [/2]  +  1  trials.  (  [/2]  is  the  largest  integer  not 
larger  than  /2.)  Hence,  if  n  >  [/2]  -(-  1  and  (/1(  /2)  >  (0,  0),  then 

(/i. /,)«[«,!] . 

where  8  =  By  definition,  we  have  also 

*'<n,(/i./a)  =  0  if/i<0 

and 

*'<*>(/i  ./»)  =  !  if/»<0. 

3.  Let  C?(A)  be  the  game  value  of  A  for  each  game  A.  If  (/„/,)  >  0,  after  one 

*  In  an  inessential  game,  an  optimal  strategy  for  a  player  is  one  that  secures  for  him  the  maxi¬ 
mum  amount  he  can  ensure  for  himself.  An  i -optimal  strategy  secures  for  him  at  least  that  amount 
less  e. 
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move  of  r<»+1»(/1>  /2),  the  players  are  playing  r(n)  (f1  r4y,  /2  —  r4y).  Hence, 
*<"+1)  (/„/*)  =  <?[>>(/,  +  IW2  -  riy)]. 

4.  Let  e  >  0.  Player  1  can  always  win  as  much  in  r<n)(/1  -+-  e,  /2  —  e)  as  in 
r(">(/1,/2).  Hence, 

>’(/l  +  *,/.“  «)  >V™(fuf2). 

We  can  now  conclude: 

(a)  From  (1)  and  (2), 

^"’(/i./s)  -»  v(fi,ft)  «  [S,  1]  if  (/i,/2)  >  (0,0) 

=  0  if  /i  <  0 

=  1  if  /a  <  0. 

(b)  From  (3),  if  (/„  /2)  >  (0,  0), 

=  ^[K/i  +  iW,  -  r„)]. 

(c)  From  (4),  for  e  >  0, 

K/i  +  «,  h  “  e)  >  ^(/i.  /z)  • 

Definition.  A  strategy  for  player  1  is  called  conditionally  optimal  if  the  conditional 
distribution  of  his  strategy  at  any  play  of  T,  given  the  course  of  the  game  up  to  that  play, 
is  an  optimal  strategy  for  the  game  \_v{<f>1  +  r4>1  —  r4y)],  where  (</>,,  <j>2)  is  the  for¬ 

tune  distribution  immediately  before  the  play  in  question. 

Lemma  6.1.  If  player  l\r  strategy  is  conditionally  optimal,  and  if  with  probability  1 
the  fortune  of  one  of  the  players  ( not  necessarily  always  the  same  one')  eventually  be¬ 
comes  non-positive,  then  player  1  can  expect  at  least  v(fu  /2)  in  payoff. 

Proof.  It  is  sufficient  to  show  that  the  probability  that  player  2’s  fortune  becomes 
non-positive  is  at  least  v(f1,f2).  Let  [(F",  F”)  |  n  >  1]  be  the  random  variable  of  for¬ 
tunes  at  play  «,  where,  if  the  game  ends  at  play  N,  (F*+i,  F%+i)  =  (F’f,  F%)  for  j  >  1 
Then,  since  player  l’s  strategy  is  conditionally  optimal,  if  ( F ”,  F?)  >  (0,0), 

Ev(F?\  F-i)  >  EG[v(F*  +  IV,  f*  -  IV,)] 

=  Ev^F”,  Fj) , 

whereas,  otherwise, 

Ev^F”*1,  F£+1)  =  Ev(F”,  F’ |)  . 

Hence,  by  induction, 

Ev(F”,  F»)  >  ~v(f2,  /2) . 


Let  [(PJ,  P")  |  n  >  1]  lie  the  random  variable  that  is 

(0,  0)  if  neither  player's  fortune  is  non-positive  by  the  end 
of  the  wth  play, 
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(0,  1)  if  the  first  player’s  fortune  is  non-positive  by  the 
end  of  the  »th  play, 

(1,0)  if  the  second  player’s  fortune  is  non-positive  by  the 
end  of  the  «th  play. 

Then 


F»)  <  EP 1  +  E(1  —  P”  —  P ?)  . 

But,  by  assumption,  the  second  term  on  the  right  tends  to  zero.  Hence,  where  e„  0, 

EP ?  + 

which  is  the  desired  result. 

Lemma  6.2.  There  is  a  conditionally  optimal  strategy  for  the  first  player  which  en¬ 
sures  that  the  probability  that  the  game  ends  by  the  nth  play  tends  uniformly  to  1  as  n 
tends  to  oo  in  the  opponent' s  strategy. 

Proof.  First,  we  show  that  for  each  (<£,,  <f>2)  >  (0,  0)  there  is  an  optimal  strategy  1 
for  the  first  player  for  the  game  +  I\ j,  4>t  —  ri>)]  such  that  for  all  /,  Pr{Tu 

>  0}  >  0.  Suppose,  on  the  contrary,  that  for  some  (<£,,<£2)  >  (0,  0),  for  all  optimal  /, 
there  is  a  /  such  that  Pr{TtJ  >  0}  =  0,  or,  since  Fi;-  0,  Pr{Tu  <  0}  =  1,  which  is 
the  same  thing.  Then,  since  player  1  is  playing  optimally, 

f(<£i,  <£2)  <  4-  T,j,  <t> 2  —  F/j). 

From  the  monotonicity  of  +  e,  <pi  —  e~) , 

v(<f> i.  <t>i)  >  +  Pu>  ~  Pis')  ■ 

Combining, 

^(^i.  <f>i)  =  +  f„,  <j>2  —  rw), 

or,  weaker,  from  monotonicity  again. 


f(<£i,  =  *'(</>!  —  1,  <f>2  +  !)  ■ 

If  —  1,  <j>2  +  1)  >  (0,  0),  this  implies  that  an  optimal  strategy  I  for  the  first  player 
for  the  game  [ v(<f> ,  4-  r,y  —  1,  <f>2  —  Tif  4-  1)]  is  an  optimal  strategy  for  4-  Tj;, 

<f>2  —  Fj,)],  since  by  using  it  against  any  /,  the  first  player  ensures  for  hirrself 

Evf<t> i  4-  r/y,  <p2  —  T,j)  >  Ev(<j> ,  4-  r;/  —  1,  d>2  ~  P/j  4- 

>  vf+t  -1,^4  1) 

=  ^(^l,  <#*2)  - 

Thus,  for  a  fortune  division  (<£,  —  l,  <f>2  4-  1)  >  (0,  0),  and  by  induction  for  a  fortune 
division,  (<£,  —  n,  <f>2  4-  «)  >  (0,  0),  for  all  optimal  strategies,  /,  there  is  a  J  such  that 
P,j  <  0.  But  eventually,  perhaps  for  n  —  0, 


n 

f 
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whereas  fa  <  »  +  1.  Therefore,  for  an  optimal  I  and  some  /, 

0  <  8  <  *<(^1  —  »,  d>«  +  »)  <  Ev (fa  —  n  +  Tjj,  fa  +  n  —  Tu) 

<  —  M  —  1>  fa  +  «+  1) 

=  0, 

which  is  the  contradiction  for  which  we  have  been  looking. 

We  have  now  proved  that  for  (0„  fa)  >0,  there  is  an  optimal  I  such  that  for  all  /, 
Pr{rH  >  0}  >  0.  For  each  {fa,  fa)  >  (0,  0),  fix  such  an  I.  Call  it  l {fa,  fa).  From  the 
compactness  of  the  second  player’s  set  of  strategies  and  the  fact  that  Pr{Tu  >  0}  is  a 
continuous  function  of  his  strategy,  Pr{Trj  >  0}  >  p(fa,  fa)  >  0.  Define  a(fa,  fa)  = 
Min*  p(fa  +  i,  fa  —  k)  >  0,  where  k  is  an  arbitrary  positive,  zero,  or  a  negative  integer 
such  that  ( fa  +  t,  fa  —  k)  >  (0,  0). 

Now  let  player  1  use  the  conditionally  optimal  strategy  that  consists  in  playing  /(<£,, 
when  the  fortune  distribution  is  fa).  Let  Q{n)  be  the  probability  that  one  play¬ 
er’s  fortune  or  the  other’s  is  exhausted  on  or  before  the  nth  play.  Then,  where  <r  — 

•</-/.). 

Q< >  o, 

>  g(»>  -f-  (1  _  QW)al't*fl^*1 . 

By  induction, 

QiN  ((/,+/,]+!))  >  1  —  (1  — 

Hence,  QlN>  1  as  N  — >  oo,  which  is  the  lemma. 

Let  fi)  be  the  game  in  which  the  two  players  repeat  F  n  times,  or  until  one 

of  the  players  has  a  non-positive  fortune,  if  this  occurs  first,  and  the  money  payoff  is 
(1,  0)  if  player  2  ends  with  a  non-positive  fortune,  or  is  (0,  1),  otherwise.  Q(n>  (flt  /2) 
is  a  constant-sum  two-person  game  with  value  [v(n)  (flt  f2),  1  —  vin)  (Jx,  /2)]  .  Obviously, 


since  any  strategy  for  player  1  in  t2<n,(/i.  /z)  w‘ll  ensure  him  as  much  money  in 
Q<“)  (fu  f2).  We  therefore  conclude,  by  the  same  reasoning  as  that  stated  earlier,  that 


(a')  »;<»>(/„  ft)  ->  K/i,  ft)  c  [0,  1  -  8'] 

-  0 
=  1 

where  8'  >  0; 


(/i,  fi)>  (0,0) 

‘f/i  <° 

if  h  <  0. 


(b')  *(/„  f2)  =  G[v{U  +  Tih  f2  ~  1%,)]  if  (/„/,)  >  (0,  0)  ; 

(0  v{fi  +  e,  ft  -  e)  >  *>(/„  ft)  if  e>  0. 

In  addition, 

(d')  *(/.,  /*)  <  "{fu  ft)  ■ 
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Definition.  A  strategy  for  player  2  is  called  conditionally  optimal  if  the  conditional 
distribution  of  his  strategy  at  any  play  of  r,  given  the  course  of  the  game  up  ro  that  play, 
is  an  optimal  strategy  for  the  game  [0(^1  +  1%,-,  <i2  —  r4y)],  where  <f>2)  is  the  for¬ 
tune  distribution  immediately  before  the  play  in  question. 

From  Lemmas  6.1  and  6.2  and  from  fc.e  analogous  Lemmas  6.1'  and  6.2'  that  we  do 
not  write  down,  we  conclude  that  each  player  has  a  conditionally  optimal  strategy  which 
ensures  that  play  ends  by  the  nth  play  with  probability  tending  uniformly  to  1  as  n  tends 
to  oo  in  the  opponent’s  strategy.  The  first  player’s  strategy  ensures  him  *»(/,,  /2)  on  the 
average,  and  the  second  player’s  strategy  ensures  him  1  —  f  (/,,  /2)  >  1  —  *>(/ „  /2)  on 
the  average.  Since  together  the  players  can  win  no  more  than  1,  we  get 

1  >  KA,/.)  +  [1  -  *(/„/,)]  >  *(/„  /2)  +  [1  -  KA,/2)]  =  1. 

This  means  that  v(J2,  /2)  =  v(Jx , /2)  =  (say)  v(Jlt  /,),  and  that  ^2>  is  inessen¬ 

tial  with  the  solution  [t'C/i, /2),  1  —  t'C/i, /2)]. 

can  be  characterized  as  being  the  unique  solution  of 

0  <  #(*»,  *2)  =  G0(*,  +  r4y,  <^2  -  r4y)]  <  1  if  (fu  /2)  >  (0, 0) 

=  0  if  fx  <  0 

=  1  if/2<0. 

For,  if  v*  is  a  solution, 

"<0,(^x.  <M  <  <  *(0>(*i,  *2) 

by  definition,  and  so,  by  induction,  using  (a),  (b),  (a'),  and  (b'), 

<  «'•(*!,*.)  <*(b,(*i,*2). 

Hence, 

f($ i>  $2)  =  ^(^*i>  $2)  5;  v*(.<t>i,<h)  ^  ^2)  =  1  $1)  1 

giving 

^(^li  ^2)  =  *'*(^lt^2)> 

as  was  to  be  proved. 

We  thus  have 

Theorem  6.1.  H(/i, /2)  «  inessential  with  the  solution  [*'(/i, /2)»1  ~t'(/i»/2)]. 
where  v  is  the  unique  solution  in  {  <f>2)  |  >  0  or  <f>2  >  0}  of 

0  <  t'(01,  <^2)  =  G:[t'(^1  +  r4„  <f>2  —  r„)]  <1  if  (<pu  ^2)  >  (0, 0) 

=  0  //  <*>  4  <  0 

=  1  if  4>t<0. 

Each  player  has  a  conditionally  optimal  strategy  that  is  optimal  and  which  ensures  that 
play  ends  by  the  nth  play  with  probability  tending  uniformly  to  1  in  the  opponent’s 
strategies. 

Let  us  turn  now  to  the  problem  of  the  effective  computation  of  an  e-optimal  strategy 
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<*■ 


for  n(/„ /2).  This  is  easy,  if  we  are  not  interested  in  efficiency.  That  is,  we  need  only 
find  an  n  such  that  t>(B)  (fu  f2)  —  /2)  <  e  —  8,  where  S  >  0.  Then  a  8-optimal 

strategy  for  the  first  player  for  n<B>  (/,,/*)  provides  an  e-optimal  strategy  for  him  for 
Q(/t,  /i).  Thus,  he  can  use  the  strategy  on  the  first  n  moves  of  /2)  and  act  arbi¬ 
trarily  thereafter.  Similarly,  a  8-optimal  strategy  for  the  second  player  for  /2) 

provides  an  e-optimal  strategy  for  him  for  Q(/,, /2). 

If  Max4_y  |  r(y  |  is  small  enough  compared  with  /,  and  /2,  another  class  of  interesting 
e-optimal  strategies  exists.  The  repeated  playing  of  an  optimal  strategy  for  V  is  an  e- 
optimal  strategy  for  Cl.  More  precisely,  let  us  remove  the  restriction  that  each  be  a 
non-zero  integer.  Let  us  require  instead,  say,  that  G(F)  >  0  and  that  for  some  optimal 
strategy  /,  Pr{Tlf  >  0}  >  0  for  all  If  G(T)  =  0,  we  require  in  addition  that  for 
some  optimal  /,  Pr{Tij  <  0}  >  0  for  all  i.  Define  a  ~  G(T),  /?  =  Min>  Pr{TIf  >  0}, 
y  =  Maxti/  |  r4i  I. 

We  assume  that  both  /,  and  /2  are  positive  and  define  /  =  /i  +  /2.  Define,  for 

a  —  0, 


and  for  a  >  0, 


P0(^l)  —  j"+  y 

=  0 

=  1 


ifo  <  0,  <  / 

if  <0 

if  4>i  >  /, 


if  0  <  <  / 


if  <*>»  <  0 
if  0i  >/• 


Lemma  6.3.  If  player  1  plays  I  repeatedly,  then  he  can  expect  at  least  pa(f1')  in  pay¬ 
off.  (/  is  any  optimal  strategy  for  T  satisfying  Pr(T,/  >  0)  >  0.) 

Proof.  Since  fi  >  0,  by  the  method  of  proof  of  Lemma  6.2,  it  follows  that  if 
player  1  plays  I  repeatedly,  the  probability  that  the  game  ends  by  the  «th  play  tends  to 
1  as  n  tends  to  oo .  Hence,  in  order  to  prove  Lemma  6.3,  it  is  sufficient  to  show  that  for 
all  N, 


EPSFV>PB{L). 


By  induction,  this  would  follow  from 


E{pa(F^)\Ff}  >/>a(F?). 

We  prove  the  latter. 

Suppose  that  a  =  0.  If  0  <  F*  <  /,  then  for  all  (/, /'),  since  <  y, 


/>.<*?  +  r<>)  ^7“T 


(Ff  +  r,,). 


GAMES  OF  SURVIVAL 


93 


Hence,  if  0  <  Ff  <  f , 

E{Po(PNr*1)  I P?)  >  Min  Ep0(F#  +  rit) 

>7T7M;n£(f?  +  r/i) 


>  — : — fn 
~f  +  y  1 

=  M*f). 

Since  if  F*  <  O  or  Ff  >  /  our  proposition  is  trivial,  we  have  disposed  of  the  case 
a  —  0. 

Suppose  now  that  a  >  0.  Again,  we  need  only  consider  0  <  F*  <  /.  Then 

i-eXp{-$(Fr  +  rt,)} 

KW  +  rif)  > -  L  y 

1  -  exp  |-  —  (/  +  y)J 


Hence, 


>  Min 


where 


Hence, 


t  £ 


|Ff}  >  Min  £/>a(F?  +  r„) 

l-£exp{_^(Ff  +  rf/)} 

tin - V-1 - r1 

y  1  -  exp  |-^  (/  +  y)  J 

1  -Mexpj-^Ff} 

1  —  exp  {  ~?  (/  +  y)  J 

Al  =  Max  E  exp  j  —  ~  T,j  j 

<  Max  jl  -  ^  E  T,)  +  (<?  -  2)^ 

<  i. 

l-expj-^F?} 

£{p<t(Ff-)|Ff}  > - j—  ^ — i—  =  Pam , 

1  -  exp  |__(/  +  y)  J 
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as  was  to  be  proved. 

By  symmetry,  if  a  =  0,  we  conclude 

Lemma  6.4.  If  a  =  0,  and  rf  player  2  plays  J  repeatedly,  then  he  can  expect  at  least 
Po(ft)  tn  payoff.  (/  is  any  optimal  strategy  for  r  satisfying  Pr(Tu  <  0)  >  0.) 

If  a  =  0,  Lemmas  6.3  and  6.4  give  us,  whenever  f2)  is  inessential  with  the 
solution  {[*'(/„/,),  1  -  K/i>  />)]}. 

M/>)  <  *(/../.)  <  1  -  *>(/.)  =  Po(h)  + 

Thus,  repeating  /  is  [y/(/  +  y)]-optimal  for  player  1,  and  repeating  /  is  [y/(/  +  y)]- 
optimal  for  player  '2.  If  a  >  0,  Lemma  6.3  gives  us,  whenever  0(/„  /2)  is  inessential 
with  the  solution  {[>( /lf/2).  1  —  K/i./s)]}. 

1  -  exp  <  />„(/,)  <  *-(/,, /2)  <  1. 

Thus,  repeating  I  is  exp  { —  (a/y*)/1}-optimal  for  player  1,  and  any  strategy  is  exp 
{  —  (o/y,)/1}-optimal  for  player  2. 

What  if,  instead  of  repeating  /,  player  1  repeats  a  S-optimal  /«,  where  8  is  the  small¬ 
est  number  for  which  Ia  is  S-optimal?  If  a  >  S,  no  great  harm  is  done,  since  it  can  be 
verified  by  precisely  the  proof  given  above  that  this  is  an  exp  {—[(a  —  8)/y2]/, }- 
optimal  strategy  for  player  1.  If,  however,  a  <  S,  player  2  can  expect  at  least  1  — 
exp  {  —  [(8  —  a)/y*]/2}  in  payoff.  When  [(S  —  a)/yz]/2  is  large,  this  payoff  is  close 
to  1,  so  that  /«  is  not  a  good  strategy.  Thus,  if  a  =  0,  no  matter  how  small  y  is,  it  is 
not  enough  to  repeat  a  S-optimal  strategy  for  sufficiently  small  S.  On  the  other  hand, 
suppose  that  (/,)  is  a  sequence  of  strategies  for  player  1  whose  wth  member  is  8n- 
optimal  for  T  and  satisfies 

Min  Pr{T,j  >  0}  >  ?  >  0, 
i 

where  fY  does  not  depend  on  n.  Then 

Lemma  6.5.  If  a  =  0  and  player  1  plays  /„  at  the  nth  stage,  then  he  can  expect  at 
least  /’.(/j)  -  [8/(1  -  8 )(/  +  y)]  in  payoff. 

Proof.  The  proof  is  almost  identical  with  that  of  Lemma  6.3,  where,  instead  of 
proving 

Epa(PD  >PM . 


one  proves  that 


Epa(F?)  >  PM 


8  -p...  +  S*-i 

f  +  V 


It  is  now  an  easy  step  (left  to  the  reader)  to 

Theorem  6.2.  If  G(T )  =  a  >  S  and  n(/t,  /2)  is  inessential,  repeating  a  strategy 
which  is  S-optimal  for  T  is  exp  {  —  [(a  —  8)/ys]/2}-'p//>»<j/  for  fi(/„  /2).  Let  G(T) 
=  0,  and  let  (/„)  be  a  sequence  of  strategies  for  player  1  whose  *th  member  is  Sn-opti- 
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mal  for  r  and  satisfies 

Min  Pr{T,j  >  0}  >  /3'  >  0, 
i 

where  /?'  does  not  depend  on  n.  Then  playing  ln  at  the  »th  stage  is  a  {  [y/(/  +  y)]  + 
[28/(1  —  S)(/  +  y)] ^-optimal  strategy  for  player  2. 

The  reader  will  observe  that  when  each  Tj y  C,  say,  |  I\y  |  >  C,  we  automatically 

have,  for  a  8n-optimal  /„,  when  8  is  sufficiently  small. 


Min  Pr{T,j  >  0}  > 
i 


C  -  8"  -  8 

C+y  ~C+ y 


>  0. 


In  closing,  we  wish  to  point  out  that  the  method  of  proof  leading  to  Theorem  6.1  is 
trivially  sufficient  to  handle  the  following  generalized  game  of  survival,  in  which  the 
result  of  a  play  is  a  random  state  instead  of  a  definite  number.  However,  the  method  is 
apparently  insufficient  to  handle  more  than  a  finite  number  of  possible  states  or  the  pos¬ 
sibility  of  "zeros.”  A  finite  set  2  with  two  distinguished  points,  o-,  and  o-2,  is  given. 
2  is  partially  ordered  by  <,  which  satisfies  for  some  fixed  n  and  all  (xj  |  1  </<«}, 

*1  <  *2  <  •  •  •  <  *»-l  <*»->*!  =  <*2.  X«  =  <*!■ 

For  each  x  €  2,  there  is  a  set  of  random  variables  on  2,  {Yjyfx)  |  1  </</„,  1  <  j  </„}, 
such  that  for  all  /  and  j,  Yyyftr,)  =  <r„  Yiy(cr2)  =  <r2;  and  for  x  ^  a„  <r2, 


Pr{Yt y(x)  <  x}  =  0  ->Pr{x  <  Ytj(x )}  —  1, 
Pr{x  <  Y ij(x)  }  =  0^Pr{Yi,(X)  <  x}  =  1. 

In  addition,  for  x  <r,,  w2  for  each  /,  there  is  a  j  such  that 


PriYu(x)  <  *}  >  0; 

and  for  each  /,  there  is  an  i  such  that 

Pr{x<  Y,;(x)}  >  0. 

Define  n^=1  Y‘*>  (x)  by  induction  by 

jf+t  r  «  ~\ 

IIY«a(x)  =  Y?«,ui|lIY<1(x)J, 

11=1  L»=l 

where  {[Y("](x)  (1  <  »  <  iu,  1  <  j  <  j0,x€  2]  }  is  a  set  of  independent  random  vari¬ 
ables,  each  distributed  like  [Y*y(x)].  Then  we  finally  require  that  x  <  x'  implies  that 
for  all  N, 

Hn = ".j  >  ^(iFv.w 

»=i  J  1  «=i  3 

= *■}  ^  pr\iiYv  w = **}  • 

>•  »=i  j  1  *=i  3 
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Ail  that  we  have  said  about  {0(/2,  /2)  }  up  to  Theorem  6.1,  trivially  modified,  applies  to 
the  games  (fi(x)},  in  which  two  players  repeatedly  and  simultaneously  choose  integers 
/»  and  jn  at  each  time  «,  until  IT£=i  (*)  =  ®i  or  <r2,  or  ad  infinitum,  if  this  never 

occurs.  The  payoff  is  (1,  0)  if  the  game  ends  in  the  state  «rlt  and  is  (0,  1)  if  the  game 
ends  in  the  state  tr2.  If  the  game  goes  on  indefinitely,  then  the  payoff  is  [a  (6),  /8(C)] 
where  [a(C),/8(C)]  <  (1,1)  and  a(C)  +  /8(C)  <  1,  and  v-here  [a(C),/8(C)]  can 
depend  on  the  course  of  the  game,  C. 

Similarly,  Theorem  6.2  can  be  generalized  by  the  use  of  expected  values  to  the  situa¬ 
tion  in  which  2  is  a  set  of  reals  satisfying,  for  <r1  >  x  >  <r2, 

Yh(x)  =  x  +  di,  if  <r2  <  x  +  aif  <  o1 

=  <r2  if  (Tj  <  x  aif 

=  o-2  if  a2  >  x  +  aif, 

wine  aif  is  a  real-valued  random  variable  whose  distribution  depends  on  (/',  /'). 
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